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Abstract 

Sequential  (one  by  one)  rather  than  simultaneous  estimation  of  multiple  breaks  is 
investigated  in  this  paper.  The  advantage  of  this  method  lies  in  its  significant  com- 
putational savings.  The  number  of  least  squares  required  to  compute  all  of  the  break 
points  is  of  order  T,  the  sample  size.  Each  estimated  break  point  is  shown  to  be 
consistent  for  one  of  the  true  ones  despite  under-specification  of  the  number  of  breaks. 
More  interestingly  and  somewhat  surprisingly,  the  estimated  break  points  are  shown  to 
be  T  consistent,  the  same  as  the  simultaneous  estimation.  Limiting  distributions  are 
also  derived.  Unlike  simultaneous  estimation,  however,  the  limiting  distributions  are 
generally  not  symmetric,  and  are  influenced  by  regression  parameters  of  all  regimes. 
A  simple  method  is  introduced  to  obtain  break  point  estimators  having  the  same  lim- 
iting distributions  as  those  obtained  via  simultaneous  estimation.  Finally,  a  procedure 
is  proposed  to  consistently  estimate  the  number  of  breaks. 


Keywords  and  Phrases:    Multiple  breaks,  sequential  estimation,  simultaneous  esti- 
mation, T  consistency,  limiting  distribution,  repartition  method. 


Running  Head:  MULTIPLE  BREAKS 


1.      Introduction 

Multiple  breaks  may  exist  in  the  trend  function  of  many  economic  time  series,  as 
suggested  by  the  studies  of  Cooper  (1995),  Garcia  and  Perron  (1994),  Papell  and 
Lumsdaine  (1995),  and  others.  This  paper  presents  some  theory  and  methods  for 
making  inferences  in  the  presence  of  multiple  breaks  with  unknown  break  dates.  The 
focus  is  the  sequential  method,  which  identifies  break  points  one  by  one  as  opposed 
to  all  at  once  simultaneously. 

A  number  of  issues  arise  with  the  existence  of  multiple  breaks.  These  include  the 
determination  of  the  number  of  breaks,  estimation  of  the  break  points  given  the  num- 
ber, and  statistical  analysis  of  the  resulting  estimators.  These  issues  are  examined  by 
Bai  and  Perron  (1994)  when  a  different  approach  of  estimation  is  used.  The  major 
results  of  Bai  and  Perron  (1994)  assume  simultaneous  estimation  which  estimates  all 
of  the  breaks  at  the  same  time.  Incidentally,  taking  advantage  of  dynamic  program- 
ming, the  simultaneous  method  requires  0{T2)  number  of  least  squares  irrespective 
of  the  number  of  break  points.  In  this  paper  we  study  an  alternative  method,  which 
sequentially  identifies  the  break  points.  The  procedure  estimates  one  break  point  even 
if  multiple  breaks  exist.  The  number  of  least  squares  required  to  compute  all  of  the 
breaks  is  proportional  to  the  sample  size.  Obviously,  simultaneous  and  sequential 
methods  are  not  merely  two  different  computing  techniques;  they  are  fundamentally 
different  methodologies  that  yield  different  estimators.  Not  much  is  known  about 
sequentially  obtained  estimators.  This  paper  develops  the  underlying  theory  about 
them. 

The  method  of  sequential  estimation  was  proposed  independently  by  Bai  and  Per- 
ron (1994)  and  Chong  (1994)  (also,  see  Bai  (1994c)  for  an  earlier  exposition  of  the 
method).  They  argued  that  the  estimated  break  point  is  consistent  for  one  of  the  true 
break  points.  However,  neither  of  the  studies  give  the  convergence  rate  of  the  esti- 
mated break  point.  In  fact,  the  approach  used  in  the  previous  studies  does  not  allow 
one  to  study  the  convergence  rate  of  sequential  estimators.  A  different  framework  and 
more  detailed  analysis  are  necessary.  The  framework  used  in  this  paper  is  adapted 
from  Bai  (1994a).  A  major  finding  of  this  study  is  that  the  sequentially  obtained 
estimated  break  points  are  T  consistent,  the  same  as  the  simultaneous  estimation. 
This  result  is  somewhat  surprising  in  that,  on  first  inspection,  one  might  even  doubt 
its  consistency,  let  alone  T  consistency,  in  view  of  the  incorrect  specification  of  the 
number  of  breaks. 

Furthermore,  we  obtain  the  asymptotic  distribution  of  the  estimated  break  points. 


The  asymptotic  distributions  of  sequentially  estimated  break  points  are  found  to  be 
different  from  those  of  simultaneous  estimation.  We  suggest  a  procedure  for  obtaining 
estimators  having  the  same  asymptotic  distribution  as  the  simultaneous  estimators. 
We  also  propose  a  procedure  to  consistently  estimate  the  number  of  breaks.  All  these 
latter  results  are  made  possible  by  the  T  consistency.  For  example,  one  can  construct 
consistent  (but  not  T  consistent)  break-point  estimators  for  which  the  procedure  will 
overestimate  the  number  of  breaks.  In  this  view,  the  T  consistent  result  for  a  sequential 
estimator  is  particularly  significant. 

This  paper  is  organized  as  follows.  Section  2  states  the  model,  the  assumptions 
needed,  and  the  estimation  method.  The  T  consistency  for  the  estimated  break  points 
is  established  in  Section  3.  Section  4  studies  a  special  configuration  for  the  model's 
parameters  that  leads  to  some  interesting  asymptotic  results.  Limiting  distributions 
are  derived  in  Section  5.  Results  corresponding  to  more  than  two  breaks  are  stated  in 
Section  6.  The  issue  of  the  number  of  breaks  is  also  discussed  in  this  section.  Section 
7  proposes  the  "repartition  method"  that  gives  rise  to  estimators  having  the  same 
asymptotic  distribution  as  simultaneous  estimation.  Section  8  deals  with  shrinking 
shifts.  Convergence  rates  and  limiting  distributions  are  also  derived.  Section  9  states 
the  results  for  general  models.  Simulation  results  are  reported  in  Section  10.  The  last 
section  concludes.  Mathematical  proofs  are  provided  in  the  appendix. 

2.      The  Model 

To  present  the  major  idea  we  shall  consider  a  simple  model  with  mean  shifts  in  a 
linear  process.  The  whole  theory  and  results  can  be  elaborated  to  general  regression 
models  using  a  combination  of  the  argument  of  Bai  (1994b)  and  this  paper.  To  make 
the  matter  even  simpler,  the  presentation  and  proof  will  be  stated  in  terms  of  two 
breaks.  Because  of  the  nature  of  sequential  estimation,  the  analysis  in  terms  of  two 
breaks  incurs  no  loss  of  generality.  This  can  also  be  seen  from  the  proof.  The  general 
results  with  more  than  two  breaks  will  be  stated  later.  The  model  considered  is  as 
follows: 

Yt  =  m  +  xu       iit<k° 

Yt    =    (i2  +  Xt,         iik°  +  l<t<k°  (1) 

Yt    =    us  +  Xu         if  k°2  +  1  <  t  <  T. 

where  /z,  is  the  mean  of  regime  i  (i  =  1,2,3)  and  Xt  is  a  linear  process  of  martingale 
differences  such  that 

oo 

xt  =  ^2  aj£t-j 

i=o 


with  a(l)  =  Y?jLoaj  7^  0.  We  assume  that  fi\  ^  /12,  Pi  ^  ^3,  so  that  there  are 
two  break  points  in  the  model.  In  addition,  we  assume  k\  =  [7Y°]  and  fc°  =  [^T2 1 
with  t°  <  r°  and  t°,t°  €  (0,1).  The  unknown  parameters  are  (t°,t°)  [or  (k°,k%)} 
and  (fii,^,^)-  The  focus  will  be  the  break  points  (t°,T2):  because  once  they  are 
obtained,  the  regression  parameters  can  be  easily  computed. 

The  main  thrust  of  sequential  estimation  is  "one  break  at  a  time."  The  model 
is  treated  as  if  there  were  only  one  break  point.  Estimating  one  break  point  for  a 
mean  shift  in  linear  processes  is  studied  by  Bai  (1994a).  A  single  break  point  can  be 
obtained  by  minimizing  the  sum  of  squared  residuals  among  all  possible  sample  splits. 
As  in  Bai  (1994a),  we  denote  the  mean  of  the  first  k  observations  by  Yk  and  the  mean 
of  the  last  T  —  k  observations  by  Yj*.  The  sum  of  squared  residuals  is 

SH*)  =  EOS -?*)■■+  EOS-W 

t=i  t=fc+i 

A  break  point  estimator  is  defined  as 

k  =  axgmin1<k<T_iST{k) . 

Using  the  formula  linking  total  variance  with  within-group  and  between-group  vari- 
ances, we  can  write,  for  each  k  (1  <  k  <  T  —  1), 

YXZ  -  y?  =  ST(k)  +  TVT(k)2  (2) 

t=l 

where  Y  is  the  overall  mean  and 

vrW  =  (^^)1/2(F;-n).  0) 

It  follows  that 

k  =  argminfc5r(fc)  =  argmaxfcVj(fc)2  =  argminfc|V7'(A;)|. 

Consequently,  the  properties  of  k  can  be  analyzed  equivalently  by  examining  Sr(k) 
or  Vr(k).  We  define  f  =  k/T.  Both  f  and  k  are  referred  to  as  the  estimated  break 
points.  The  former  is  also  referred  to  as  estimated  break  fraction. 

One  of  our  major  results  is  that  f  is  T  consistent  for  one  of  the  true  breaks  t°.  It 
should  be  pointed  out,  however,  k  itself  is  not  consistent  for  any  of  the  k°  (i  =  1,2).  For 
ease  of  exposition,  we  shall  frequently  say  that  k  is  T  consistent  with  an  understanding 
that  we  are  actually  referred  to  f. 


Below  axe  assumptions  that  guarantee  T  consistency. 

Assumption  Al  The  et  axe  martingale  differences  satisfying  E(et\J~t-i)  =  0, 
Ee2  =  °'2)  an<^  there  exists  a  8  >  0  such  that  supt  £|et|2+*  <  oo,  where  J-t  is  the 
a— field  generated  by  e,  for  s  <t. 

Assumption  A2 

oo 
i=0 

Assumption  A3  m  ±  fii+1,  k?  =  [Tif\,  and  t?  €  (0, 1)  (i=l,2)  with  r°  <  r2°. 

These  assumptions  are  used  in  Bai  (1994a),  except  A3  which  is  stated  in  terms  of 
a  single  break.  Assumptions  Al  and  A2  are  standard  for  linear  processes.  A3  assumes 
that  there  are  two  breaks.  The  next  section  proves  the  T  consistency  of  f  for  one  of 
the  break  points.  The  identification  for  the  other  break  point  will  also  be  considered. 

3.      Consistency  and  Rate  of  Convergence 

In  this  section,  a  number  of  useful  properties  for  the  sum  of  squared  residuals  Sr(k) 
will  be  presented.  These  properties  lead  to  the  consistency  result  naturally.  Write 
Ut(t)  =  T^StUTt])  for  r  €  [0, 1].  We  define  both  ST(0)  and  ST(T)  as  the  total  sum 
of  squared  residuals  with  the  full  sample,  i.e.  5r(0)  =  St(T)  =  Y%=i(Yt  ~  Y)"1-  This 
definition  is  also  consistent  with  (2),  as  Vt-(O)  =  Vr(T)  =  0.  In  this  way,  Ut(t)  is  well 
defined  for  all  r  e  [0,1]. 

Lemma  1.  Under  A1-A3,  Ut{t)  converges  uniformly  in  ■probability  to  a  nonstochastic 
function  U(t)  on  [0,1]. 

The  limit  U(t)  is  a  continuous  function  and  has  different  expressions  over  three 
different  regimes.  In  particular, 


TT(<r°\  —  zt2    _l  (  T2)(r2        Ti)  i  „  „    \2 

1  ~  Ti 


(4) 


and 


Utf)  =  a\  +  ^(r2°  -  r°)(Ml  -  /z2)2  (5) 

where  c\  =  EX?. 
Lemma  2.    Under  assumptions  A1-A3, 


sup 

Kk<T 


UT(k/T)  -  EUT(k/T)\  =  O^T-1'2). 


This  lemma  says  that  the  objective  function  (as  a  function  of  k  )  is  uniformly 
close  to  its  expected  function.  As  a  result,  if  the  expected  function  is  minimized  at  a 
certain  point,  then  the  stochastic  function  will  be  minimized  at  a  neighborhood  of  that 
point  with  large  probability.  To  study  the  extreme  value  of  the  expected  function,  we 
need  an  additional  assumption,  which  is  stated  in  terms  of  the  limiting  function  U(t). 
Typically,  the  function  U(t)  has  two  local  minima.  To  ensure  the  smallest  value  of 
U(r)  is  unique,  we  assume: 

Assumption  A4.  C/(r°)  <  U(t°). 

This  condition  guarantees  the  uniqueness  of  the  global  minimum  of  U(t).  The 
condition  is  equivalent  to,  by  (4)  and  (5), 

4(/*l  -  H2?  >  \Z4>{H2  -  tof  (6) 

Evidently,  the  condition  assumes  that  the  first  break  is  more  dominating  in  terms  of 
the  relative  span  of  regimes  and  the  magnitude  of  shifts.  In  other  words,  when  the 
first  break  is  more  pronounced  (larger  t°  and/or  larger  fa  —  /i2),  A4  will  be  true. 
The  inequality  will  be  reversed  when  the  second  break  is  more  pronounced.  Under 
A4  together  with  A1-A3,  the  estimated  fraction  f  converges  in  probability  to  r°. 
This  is  true  because  only  if  the  more  pronounced  break  is  chosen  can  the  sum  of 
squared  residuals  be  reduced  the  most.  If  the  inequality  in  A4  is  reversed,  then  by 
mere  symmetry,  f  converges  in  probability  to  r° .  In  the  next  section,  we  examine 
the  case  in  which  U(t°)  =  C/(r°).  Under  this  condition,  we  show  that  f  converges  in 
distribution  to  a  random  variable  with  equal  mass  at  t°  and  t°  only.  Incidentally,  the 
set  of  parameters  {(r°,  t°,  /ii,/*2i  A^)}  which  makes  £/(t°)  =  f/(T°),  defines  a  subset 
of  7l5  having  a  Lebesgue  measure  zero. 

Lemma  3.  Under  assumptions  A1-A4,  there  exists  a  C  >  0,  only  depending  on  r°, 
and  fij  (i  =  1,2,  j  =  1,2,3)  such  that 

EST(k)  -  EST(k°)  >  C\k  -  Jfe°|   for  all  large  T. 

The  lemma  implies  that  the  expected  value  of  the  sum  of  squared  residuals  is  min- 
imized at  k°  only.  As  mentioned  earlier,  because  of  the  uniform  closeness  of  the 
objective  function  to  its  expected  function  by  Lemma  2,  it  is  reasonable  to  expect 
that  the  minimum  point  of  the  stochastic  objective  function  is  close  to  k°  with  large 
probability.  Precisely,  we  have 


Proposition  1.   Under  assumptions  A1-A4, 

f  -  r°  =  0p(T-^2). 

That  is,  the  estimated  break  point  is  consistent  for  r°. 

This  proposition  not  only  establishes  consistency  but  also  gives  a  convergence  rate. 
Proof: 

ST(k)  -  SrikO)    =    ST(k)  -  EST(k)  -  [Sr(fc?)  -  £Sr(*?)]  +  EST(k)  -  EST{k°) 

>  -2  sup  \ST(j)-EST(j)\+EST(k)-EST(k°) 

>  -2  sup   \ST(j)-EST(j)\  +  C\k-k°\      by  Lemma  3. 

1<j<T 

The  above  holds  for  all  k  €  [1,  T].  In  particular,  it  holds  for  k.  From  Sr(k)  —  Sr{k°)  < 

0,  we  obtain 

|A:  —  A:°|  <  C~a2  sup   \ST(j)  -  EST(j)\. 
i<i<T 

Dividing  the  above  inequality  by  T  on  both  sides  and  using  Lemma  2,  we  obtain  the 
proposition  immediately.        □ 

The  above  convergence  rate  is  obtained  by  examining  the  global  behavior  of  the 
objective  function.  We  can  use  this  initial  rate  of  convergence  to  obtain  a  better  rate. 
Define  Dt  =  {k  :  Ttj  <  k  <  2V°(1  —  77)}  ,  where  77  is  a  small  positive  number  such 
that  r°  €  (n,T°(l  —  77))  and  Dm  =  {k  :  \k  —  k°\  <  M},  where  M  <  00  is  a  constant. 
Thus  for  each  k  €  Dt,  k  is  both  away  from  1  and  away  from  the  second  break  point 
with  a  positive  fraction  of  observations.  By  Proposition  1,  k  will  eventually  fall  into 
Dt-  That  is,  for  every  e  >  0,  P(k  $.  Dt)  <  t  for  all  large  T.  We  shall  argue  that  k 
must  eventually  fall  into  Dm  with  large  probability  for  large  M,  which  is  equivalent 
to  T  consistency. 

Let  Dt,m  be  the  intersection  of  Dt  and  the  complement  of  Dm,  that  is,  Dt,m  = 
{k:Tt}<k<  Tr2°(l  -  77),  \k  -  fc°|  >  M). 

Lemma  4.   Under  A1-A4,  for  every  e  >  0,  there  exists  an  M  <  00  such  that 

P  (  min    ST(k)  -  ST(k°)  <  0 )  <  e. 

Proposition  2.  Under  assumptions  A1-A4,  for  every  e  >  0,  there  exists  an  M  <  00 
such  that 

P(T\t  -  t°|  >  M)  <  t. 

That  is,  the  break  point  estimator  is  T  consistent. 

6 


Proof:  Because  Sr(k)  <  Sr(k®),  if  k  €  A,  it  must  be  the  case  that  min^/i  Sr(k)  < 
Sr(^i)j  where  A  is  an  arbitrary  subset  of  integers.  Thus 

P(\k  -  Jfc°|  >  M)  <  P(jfc  £  Dr)  +  P(Jb  e  £>T,  |ifc  -  fc°|  >  M) 

<e  +  P(  min   Sr(*)  -  ST(k°)  <  0)  <  2e 

by  Lemma  4.  This  proves  the  proposition.        □ 

The  rate  of  convergence  is  identical  that  of  simultaneous  estimators;  see  Bai  and 
Perron  (1994). 

If  it  is  known  that  k/T  is  consistent  for  r°,  then  an  estimate  for  r°  can  be  easily 
constructed.  Just  apply  the  same  technique  to  the  random  sample  [k, T).  Let  k? 
denote  the  resulting  estimator.  Then  t2  =  k2/T  must  be  T  consistent  for  r° .  This 
follows  because  k  =  fc°  +  0P(1),  one  is  effectively  using  [fc°,  T]  to  estimate  the  second 
break.  Alternatively,  even  if  k  <  fc° ,  the  dominating  break  in  the  subsample  [k,  T]  can 
only  be  k°.  Thus  our  previous  analysis  implies  the  T  consistency  of  f2  for  t°  . 

In  summary,  T  consistent  estimators  for  r°  and  r"  can  be  obtained  by  sequential 
procedure.  The  total  number  of  least  squares  required  is  no  more  than  2T. 

4.      The  case  of  U(r?)  =  U(t§) 

When  U(r°)  =  U(t%),  it  is  easy  to  show  that  U(t),  as  a  function  of  r,  has  two  local 
minima  at  r°  and  r" ,  respectively.  This  leads  to  the  conjecture  that  the  estimated 
break  point  f  may  converge  in  distribution  to  a  random  variable  with  mass  at  r°  and 
t°  only.  Indeed  we  have  the  following  result: 

Proposition  3.  Under  Al-AS  together  with  U(t°)  =  U(t$),  the  estimator  t  con- 
verges in  distribution  to  a  random  variable  with  equal  mass  at  r°  and  r®,  respectively. 
Furthermore,  f  converges  either  to  r°  or  to  t°  at  rate  T  in  the  sense  that  for  every 
e  >  0,  there  exists  an  M  <  oo  such  that 

P(\T(t  -  r°)|  >  M  and  \T(t  -  r2°)|  >  M)  <  e. 

To  prove  the  proposition,  we  need  a  number  of  preliminary  results.  Analogous  to 
Lemma  3,  we  have 

Lemma  5.  Under  the  assumptions  of  Proposition  3,  there  exists  C  >  0  such  that  for 
all  large  T, 

EST{k)  -  EST(k°)  >  C\k  -  k\\     Vk  <  ftg, 


EST{k)  -  EST(k°)  >  C\k  -  k°2\    VJfc  >  k'0 
where  k£  =  (k°  +  k%)/2. 

The  choice  of  k$  in  the  above  fashion  is  not  essential.  Any  number  in  between  k° 
and  fc°  while  bounded  away  from  k®  and  k°  with  a  positive  fraction  of  observations  is 
equally  valid. 

Let  kx  be  the  location  of  the  minimum  of  Sr(k)  for  k  such  that  k  <  k£,  that  is, 
&i  =  argminfc<fc.5r(fc).  Let  k2  =  argmini.<Jt5r(A;).  Note  that  this  k2  is  different  from 
the  one  defined  in  the  previous  section.  Also,  k\  and  k2  are  not  estimators  as  fcj  is 
unknown.  They  are  introduced  here  for  theoretical  purposes.  It  is  clear  that  the  global 
minimizer  k  satisfies: 

l=ik     if  ST(k)  <  Srih)  m 

\  k2     iiST(k1)>ST{k2).  l  ) 

Note  that  P(5t(^i)  =  Sr{k2))  =  0  if  Xt  has  a  continuous  distribution.  Even  without 
the  assumption  of  continuous  distribution  for  Xt,  because  T-1/2{Sr(ki)  —  Sxifa)} 
converges  in  distribution  to  a  normal  random  variable  (see  the  proof  of  Lemma  7 
below),  the  event  {5r(&i)  =  Sx{k2)}  has  a  probability  approaching  zero  as  the  sample 
size  increases. 

Let  f,  =  k/T  (i  =  1,2).  Using  Lemma  2  and  Lemma  5,  we  can  easily  obtain  the 
following  result  analogous  to  Proposition  1. 

fx-r^o^r-1/2), 

r2  -  r°  =  0P(T~^). 
The  root  T  consistency  is  strengthened  to  T  consistency  using  the  following: 

Lemma  6.  Under  the  assumptions  of  Proposition  S,  for  every  e  >  0,  there  exists  an 
M  >  0  such  that 

P  (    min    ST(k)  -  ST(k?)  <  0  I  <  e,        for  i  =  1,2 


,keD$M 


where 


D?M  =  {k:Tr,<k<  *&  \k  -  *»|  >  M}, 
D{t,m  =  {k:k-  +  l<k<T(l-n),\k-  k°\  >  M). 

Lemma  6  together  with  the  consistency  result  implies  the  T  consistency  of  fc,-  the 
same  way  as  Lemma  4  (together  with  the  consistency)  implies  the  T  consistency  of  k 
of  Section  3.  Using  the  T  consistency,  we  can  prove 
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Lemma  7.    Under  the  assumptions  of  Proposition  3, 

lim  P(k  =  Ja)  =  1/2,       i  =  1,2. 

T— >oo 

Proof  of  Proposition  3.  By  lemma  7,  P{t  =  f.)  ->  1/2  (t  =  1,2).  But  f,  -£-►  r°,  it 
follows  that  f  converges  in  distribution  to  random  variable  with  equal  mass  at  r°  and 
r°.  The  second  part  of  the  proposition  follows  from  the  T  consistency  of  f,.       □ 

It  is  clear  that  A;  is  a  good  estimator  for  one  of  the  breaks.  If  it  is  known  that  k, 
for  a  given  sample,  is  estimating  k° ,  then  we  can  use  the  subsample  [k,  T]  to  estimate 
k°.  Note  that  this  second  stage  estimator  is  not  necessarily  equal  to  &2,  as  the  latter 
is  based  on  the  sum  of  squared  residuals  using  the  entire  sample.  Similarly,  if  A;  is 
estimating  &°,  we  can  use  the  sample  [1,  k]  to  estimate  fc°.  Let  (k^\k^)  denote  the 
ordered  pair  of  the  first  stage  and  the  second  stage  estimators  such  that  k^  <  k^\ 
It  follows  that  the  ordered  pair  forms  a  T  consistent  estimator  for  (k® ,  k°). 

5.      Limiting  Distribution 

Given  the  rate  of  convergence,  it  is  relatively  easy  to  derive  the  limiting  distributions. 
We  strengthen  the  assumption  of  second  order  stationarity  to  strict  stationarity. 

Assumption  A5.  The  process  {Xt}  is  strictly  stationary.1 

Let  {Xt  }  be  an  independent  copy  of  the  process  of  {Xt}.  Define  W^(£,  A)  = 
Wi1](£,  A)  for  £  <  0  and  W^\£,  A)  =  W2(1)(£,  A)  for  £  >  0  and  W^{0,  A)  =  0,  where 

Wi(1,(/,  A)  =  -2(/x2  -  Ml)  £  Xt(1)  +  |*|(02  -  0i)2(l  +  A),     £  =  -1,  -2, ... 

t=i+i 

W?\£,  A)  =  2(02  -/*!)£  Xt(1)  +  £(fi2  -  /zx)2(l  -  A),     £=1,2,... 

t=i 

Proposition  4.  Under  assumptions  A1-A5,  together  with  the  assumption  of  contin- 
uous distribution  for  Xt, 

fc  _  fc°  _±>  argmin,W(1)(£,  Ar), 

where 

1  -r2°//i3-02' 


JThis  assumption  allows  one  to  express  the  limiting  distribution  free  from  the  change  point 
(fcJ).The  assumption  can  be  dispensed  with,  see  Bai  (1994b). 


Note  that  condition  A4  [or  equivalently  (6)]  guarantees  that  |Ai|  <  1.  The  assump- 
tion of  continuous  distribution  ensures  the  uniqueness  of  the  global  minimum  for  the 
process  W^(£,  Ai),  so  that  argmin^W^1)^,  Ai)  is  well  denned.  The  proof  of  this  propo- 
sition is  provided  in  the  Appendix. 

When  A  is  zero,  the  limiting  distribution  corresponds  to  that  of  a  single  break 
(/Z3  =  H2)  or  to  that  of  the  first  break  point  estimator  in  the  case  of  multiple  breaks 
with  simultaneous  estimation.  If  Xt  has  a  symmetric  distribution  and  A  is  equal  to 
zero,  W^(£,  A)  and  W^(— £,  A)  will  have  the  same  distribution  and  consequently, 
k  —  k°  will  have  a  symmetric  distribution.  Because  A  ^  0  generally,  the  limiting 
distribution  from  sequential  estimation  is  not  symmetric  about  zero.  For  positive  A 
(or  equivalently,  ^2  —  ^1  and  fi3  —  \i2  have  the  same  sign),  the  drift  term  of  W2  '(£,  A)  is 
smaller  than  that  of  W\  (£,  A).  This  implies  that  the  distribution  of  k  will  have  a  heavy 
right  tail,  a  tendency  to  overestimate  the  break  point  relative  to  the  simultaneous 
estimation.  For  negative  A,  there  is  a  tendency  to  underestimate  the  break  point. 
These  theoretical  implications  are  all  borne  out  by  Monte  Carlo  simulations. 

Suppose  the  inequality  in  Assumption  A4  is  reversed,  i.e.  U(t°)  >  U(t%).  Then 
by  mere  symmetry, 

k  -  jfc°  -i+  argmin,W(2>(£,  A2) 


where 

WW{£,\)  = 

and 


-2(^3  -  02)  EL+i  X\2)  +  I'lfos  "  ^2)2(1  +  A),    £=  -1,  -2, 
2(^3  - 112)  EL  X?]  +  l(to  -  /^)2(1  -  A),  £  =  1, 2, ... 


A2  =  ZL(^i); 

T2U  V3  ~  W 


with  \X\    }  being  an  independent  copy  of  the  process  {Xt},  and  being  also  indepen- 
dent of  {X((1)}. 

As  discussed  in  Section  3,  when  k/T  is  consistent  for  r°,  an  estimate  for  t°  can  be 
obtained  by  applying  the  same  technique  to  the  subsample  [k,  T).  Let  k2  denote  the 
resulting  estimator.  We  have  argued  that  f2  =  k2/T  is  T  consistent  for  r2 .  Moreover, 
we  shall  prove  that  the  limiting  distribution  of  k2  —  k%  is  the  same  as  that  from  a 
single  break  model.  More  precisely, 

Proposition  5.   Under  assumptions  A1-A5, 

k2-k°-U  argmin^^O) 

and  is  independent  of  k  —  fc°  asymptotically. 
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The  proof  is  given  in  the  appendix.  The  asymptotic  independence  follows  because 
k  and  k2  are  determined  by  increasingly  distant  observations  that  are  only  weakly 
dependent. 

Similarly,  if  k/T  is  consistent  for  r° ,  then  one  can  use  the  sample  [1,  k]  to  estimate 
r°.  The  resulting  estimator  must  be  T  consistent.  The  limiting  distribution  is  given 
by  aigmineW^(£,0). 

We  now  consider  the  case  in  which  U(t°)  =  U(t°).  As  in  Section  4,  let  (fc(1\  k^) 
denote  the  ordered  pair  of  the  first  and  second  stage  estimators.  Then  we  have  the 
following  result:  for  £  =  1,2 

£.(«')      J.°    d    /  axgnuiitWW^,  A,-)     with  probability  1/2 
•'        '  \  argmin,W(')(£,0)       with  probability  1/2. 

This  is  true  because,  in  the  limit,  with  probability  1/2,  k^  is  the  first  stage  estimator, 
and  with  probability  1/2,  it  is  the  second  stage  estimator.  When  k^  is  the  first  stage 
estimator,  its  limiting  distribution  is  given  by  argmin^W^1^,  Ai).  When  k^  is  the 
second  stage  estimator,  its  limiting  distribution  is  given  by  aigm.mtW^(£,  0)  because 
it  is  estimated  effectively  with  the  sample  [1,  k%],  which  contains  only  a  single  break. 
The  argument  for  k^  is  similar. 

6.      More  Than  Two  Breaks 

In  this  section,  we  extend  the  procedure  and  the  theoretical  results  to  general  multiple 
break  points: 


Yt    =    m+  Xt,  if  r  <  fc° 

Yt    =    m  +  Xu  if  Jfc?  +  1  <  t  <  k° 


(8) 


Yt    =    fim+i+Xt,         iikl  +  l<t<T. 

where  /*,  ^  m+1  fc?  =  [Tt?],  r?  €  (0,1)  and  r?  <  r?+1  for  t  =  1,  ...,m  with  r°+1  =  1. 
Assume  the  process  Xt  satisfies  A1-A2. 

Define  the  quantities  Sr(k),  Vr(k),  and  Ut(t)  as  before,  and  denote  by  U[t)  the 
limit  of  Ut{t).  Again,  let  k  =  argminiSx(A:)  and  f  =  k/T.  From  the  proof  for  the 
earlier  results  in  the  appendix  we  can  see  that  the  assumption  of  two  breaks  is  not 
essential.  With  more  than  two  breaks,  one  just  needs  to  deal  with  extra  terms.  The 
argument  is  virtually  identical.  Therefore  we  state  the  major  results  without  proof. 
First  we  impose  the  following: 

Assumption  A6:  There  exists  an  i  such  that,  U(t°)  <  U(t°)  for  all  j  ^  i. 
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Proposition  6.   Under  assumptions  of  A 1- A3  and  A6,  the  estimated  break  point  f  is 
T  consistent  for  rf . 

Proposition  7.    Under  the  assumption  of  Proposition  6  and  A5, 

k  -  Jfcf  -U  aigmintWli)(l,  A,) 

where  W^*'(£,  A)  has  the  same  form  as  W^l'{£,  A)  with  (/12  —  f*i)  replaced  by  (/z,+i  —  //,) 
and 


A,=        ' 


fJ-i+1   ~  Hi 


1    rt  j=,+i  r«-  ^=1 


Again,  assumption  A6  ensures  that  |A,|  <  1. 

A  new  terminology  is  appropriate  here.  A  subsample  [k,£]  is  said  to  contain  a 
nontrivial  break  point  if  both  k  and  £  are  bounded  away  from  the  break  point  with 
a  positive  fraction  of  observations.  That  is,  k°  —  k  >  Tto  and  £  —  k°  >  Tcq  for  some 
e0  >  0  and  for  all  large  T,  where  k°  is  a  break  point  inside  [k,£].  This  definition  rules 
out  subsamples  such  as  [1,  k]  where  k  =  k°  +  Op(l). 

When  it  is  known  that  the  subsample  [1,  k]  contains  at  least  one  nontrivial  break 
point,  the  same  procedure  can  be  used  to  estimate  a  break  point  based  on  the  sample 
[1,  &].  That  is,  the  second  break  point  is  defined  as  the  location  where  S^)  is  mini- 
mized over  the  range  [1,  k].  The  resulting  estimator  must  be  T  consistent  for  one  of  the 
break  points,  assuming  again  assumption  A6  holds  for  this  subsample.  Furthermore, 
the  resulting  estimator  has  a  limiting  distribution  as  if  the  sample  [1,'fcP]  were  used 
and  thus  has  no  connection  with  parameters  in  the  sample  [kf  +  1,T].  This  is  because 
the  first  stage  estimator  k/T  is  T  consistent  for  r°.  A  similar  conclusion  applies  to 
the  interval  [k,  T].  Therefore,  second  round  estimation  may  yield  an  additional  two 
breaks,  and  consequently,  up  to  4  subintervals  are  to  be  considered  in  the  third  round 
estimation.  This  procedure  is  repeated  until  each  resulting  subsample  contains  no 
nontrivial  break  point.  Assuming  the  knowledge  of  the  number  of  breaks  as  well  as 
the  knowledge  of  the  existence  of  a  nontrivial  break  in  a  given  subsample,  then  all  the 
breaks  can  be  identified  and  all  the  estimated  break  fractions  are  T  consistent.  The 
total  number  of  least  squares  required  is  no  more  than  mT;  here,  m  is  the  number  of 
breaks. 

A  problem  arises  immediately  in  practice  as  to  whether  a  subsample  contains  a 
nontrivial  break,  which  is  clearly  tied  up  with  the  determination  of  the  number  of 
breaks.    We  suggest  that  the  decision  be  made  based  on  testing  the  hypothesis  of 
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parameter  constancy  for  the  subsample.  We  prove  in  the  next  section,  such  a  decision 
rule  leads  to  a  consistent  estimation  of  the  number  of  breaks,  and  implicitly  a  correct 
judgment  about  the  existence  of  a  nontrivial  break  in  a  given  subsample. 

6.1.      Determination  of  the  number  of  breaks 

The  number  of  breaks,  m,  in  practice  is  unknown.  We  show  how  the  sequential 
procedure  coupled  with  hypothesis  testing  can  yield  a  consistent  estimate  for  the 
true  number  of  breaks.  The  procedure  works  in  a  similar  way  as  described  in  the 
previous  section.  Along  the  way,  hypothesis  testing  is  used  as  an  auxiliary  tool  to 
determine  the  existence  of  a  break  point  for  a  given  subsample.  We  summarize  the 
procedure  here.  When  the  first  break  point  is  identified,  the  whole  sample  is  divided 
into  two  subsamples  with  the  first  subsample  consisting  of  the  first  k  observations 
and  the  second  subsample  consisting  of  the  rest  of  the  observations.  We  then  perform 
hypothesis  testing  of  parameter  constancy  for  each  subsample,  estimating  a  break 
point  for  the  subsample  where  the  constancy  test  fails.  Divide  the  corresponding 
subsample  further  into  subsamples  at  the  newly  estimated  break  point,  and  perform 
parameter  constancy  tests  for  the  hierarchically  obtained  subsamples.  This  procedure 
is  repeated  until  the  parameter  constancy  test  is  accepted  for  all  sequentially  obtained 
subsamples.  The  number  of  break  points  is  equal  to  the  number  of  subsamples  minus 
1. 

Let  m  be  the  number  of  breaks  determined  in  the  above  procedure  and  mQ  is  the 
true  number  of  breaks.  We  argue  that  P(m  —  mo)  converges  to  1  as  the  sample  size 
grows  unbounded,  provided  the  size  of  the  tests  converges  to  zero  slowly.  To  prove 
this  assertion,  we  need  the  following  general  result.  Let 

Yt    =    m+Xu  if  -  ni  +  1  <  t  <  0, 

Yt    =    fi  +  Xu  if  1  <  t  <  n  (9) 

Yt    =    fJ-2  +  Xt,  if  n  +  1  <  t  <  n  +  n-i 

where  n  is  a  nonrandom  integer  and  ni  and  n?,  are  integer-valued  random  variables 
such  that  n,-  =  0P(1)  as  n  — ►  oo.  The  first  and  the  third  regimes  are  dominated  by 
the  second  one  in  the  sense  that  rii/n  =  O^n'1).  Let  N  =  n  +  ri\  +  n2.  The  supF 
test  is  based  on  the  difference  between  restricted  and  unrestricted  sums  of  squared 
residuals.  More  specifically,  let  SN  =  £?=-«, +i(^  -  Yf  and  SN(k)  =  £*=-„1+i(*f  - 
^t)2  +  Hr=fc+i(^  —  ijt*)25  where  Yk  represents  the  sample  mean  for  the  first  k  +  ni 
observations  and  Yk*  represents  the  sample  mean  for  the  last  n  +  712  —  k  observations. 
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The  supF  test  is  then  defined  as,  for  some  77  €  (0, 1/2), 

„  Sn  —  S^ik) 

SUp  I'M  =  SUp  — — —!- 

Nv<k<N(l-v)  & 

where  a2  is  a  consistent  estimator  of  a(l)2cr2.  Note  that  a(l)2of  is  proportional  to 
the  spectral  density  of  Xt  at  zero,  which  can  be  consistently  estimated  in  a  number 
of  ways. 

Lemma  8.    Under  model  (9)  and  assumptions  Al-AS,  as  n  — ►  00, 

sup^     sup     W-rW  , 

i?<t<1-tj  T(L  —  T) 

where  B(-)  is  standard  Brownian  motion  on  [0,1]. 

The  limiting  distribution  is  identical  to  what  it  would  be  in  the  absence  of  the  first 
and  the  last  regime  in  Model  (9).  This  is  simply  due  to  the  stochastic  boundedness 
of  ni  and  n-i-  We  assume  that  the  supF  test  is  used  in  the  sequential  procedure,  and 
the  critical  value  and  size  of  the  test  are  based  on  the  asymptotic  distribution.  Using 
this  lemma,  we  can  prove 

Proposition  8.  Suppose  that  the  size  of  the  test  ax  converges  to  zero  slowly.  Then 
under  model  (8)  and  assumptions  A1-A2, 


P(m  =  m0)  — *  1,  as  T 


00 


Proof:  Consider  the  event  {m  <  mo}.  When  the  estimated  number  of  breaks  is 
less  than  the  true  number,  there  must  exist  a  segment  [k,£]  containing  at  least  one 
true  break  point  which  is  nontrivial  in  the  sense  that  both  the  distance  from  k  to  the 
break  point  and  the  distance  from  £  to  the  break  point  contain  a  positive  fraction  of 
observations.  That  is,  k°  —  k  >  TtQ  and  £—k°  >  Tcq  for  some  to  >  0,  where  k°  €  (k,£) 
is  a  break  point.  Then  the  test  statistic  based  on  this  subsample  must  converge  to 
infinity  because  the  sup  F  test  is  consistent,  see  Andrews  (1993).  Thus,  one  will  reject 
the  null  hypothesis  of  parameter  constancy  (as  long  as  ax  does  not  decrease  too  fast). 
This  implies  that  P(m  <  m)  converges  to  zero  as  the  sample  size  increases. 

Next,  consider  the  event  {m  >  mo}.  For  m  >  m0  to  be  true,  it  must  be  the  case 
that  for  some  i,  at  a  certain  stage  in  the  sequential  estimation,  one  rejects  the  null 
hypothesis  for  the  interval  [&,-,  &t-+i],  where  A;,-  =  k°  +  Op(l)  and  &,+i  =  kf+1  +  Op{\). 
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That  is,  the  given  interval  contains  no  nontrivial  break  point,  but  the  null  hypothesis 
is  rejected.  Thus 

P(rh  >  m0)    <    P(3i,  reject  parameter  constancy  for  [fc,-,  £,+1]) 

TTlo 

<    5Z-P(  reject  parameter  constancy  for  [&,-,  &t+i]) 


i=0 


where  ko  =  1  and  fcmo+i  =  T.  Because  £,■  =  k°  +  Op(l)  and  ki+i  =  fc°+1  +  Op(l),  if 
one  lets  n  =  fc°+1  —  kf  and  TV  =  fct+i  —  hi,  then  the  supF  statistic  computed  for  the 
subsample  [hi,  fc,+i]  converges  in  distribution,  by  Lemma  8,  to  the  right  hand  side  of 
(10).  Denote  the  limiting  distribution  by  £.  Suppose  the  size  qj  and  the  critical  value 
ct  are  chosen  using  the  asymptotic  distribution  such  that  P(£  >  C?)  <  qt,  then  for 
large  T  (and  hence  large  n),  P(sup  Fn  >  ct)  <  2qj.  Thus  P{fh  >  mo)  <  (mo  +  l)2ar, 
which  converges  to  zero  if  ar  converges  to  zero  as  T  increases.  This  completes  the 
proof  of  the  proposition.  □ 

It  remains  unanswered  as  to  what  rate  aj  should  converge  to  zero.  Of  course, 
the  rate  should  be  low  so  that  the  critical  value  will  not  increase  too  quickly  in  order 
to  guarantee  a  rejection  under  the  alternative  hypothesis.  With  the  existence  of  a 
nontrivial  break,  the  statistic  supi^V  is  of  order  T.  Therefore,  any  vanishing  sequence 
of  ax  making  cj  a  lower  order  than  T  is  sufficient.  A  more  accurate  statement  can 
also  be  made  about  the  quickest  rate  for  qj.  Such  a  rate  is  clearly  linked  to  the  tail 
behavior  of  the  random  variable  £.  In  practice,  the  issue  is  perhaps  more  empirical 
than  theoretical.  An  appropriate  choice  requires  an  assessment  of  the  adverse  effect  of 
overestimating  or  underestimating  on  the  problem  under  consideration.  If  underesti- 
mating is  more  costly,  larger  size  may  be  used  and  vice  versa.  For  most  economic  data 
with  moderate  size,  we  recommend  a  5%  significant  level.  Once  the  size  is  chosen,  all 
the  rest  can  be  automated.2 

Bai  and  Perron  (1994)  propose  an  alternative  strategy  for  selecting  the  number  of 
breaks.  We  first  describe  their  procedure  for  estimating  the  break  points  when  the 
number  of  breaks  is  known.  In  each  round  of  estimation,  their  method  selects  only  one 
additional  break.  The  single  additional  break  is  chosen  such  that  the  sum  of  squared 
residuals  for  the  total  sample  is  reduced  the  most.  For  example,  at  the  beginning  of 
the  ith  round,  i  —  1  breaks  are  already  determined,  yielding  i  subsamples.  The  ith 
break  point  is  chosen  in  the  subsample  for  which  the  reduction  in  the  sum  of  squared 
residuals  is  the  most.  The  procedure  is  repeated  until  the  specified  number  of  break 


2  A  computer  program  written  in  GAUSS  for  both  sequential  and  simultaneous  estimations  is 
available  upon  request.  The  program  is  introduced  in  Bai  and  Perron  (1994). 
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points  is  obtained.  It  is  necessary  to  know  when  to  terminate  the  procedure  when 
the  number  of  breaks  is  unspecified.  The  stopping  rule  is  based  on  a  test  for  the 
presence  of  an  additional  break  given  the  number  of  breaks  already  obtained.  The 
number  of  breaks  is  the  number  of  subsamples  upon  terminating  the  procedure  minus 
1.  Again,  assuming  the  size  of  the  test  approaches  zero  at  a  slow  rate  as  the  sample 
size  increases,  the  number  of  breaks  determined  in  this  way  is  also  consistent.  A 
further  alternative  is  proposed  by  Yao  (1987).  Yao  suggests  the  BIC  criterion.  His 
method  requires  simultaneous  estimation. 

6.2.     Some  Comments 

Although  the  asymptotic  theory  implies  that  the  sequential  procedure  will  not  un- 
derestimate (in  a  probabilistic  sense)  the  number  of  breaks,  Monte  Carlo  simulations 
show  that  the  procedure  has  a  tendency  to  underestimate.  The  problem  was  caused 
in  part  by  the  inconsistent  estimation  of  the  error  variance  in  the  presence  of  mul- 
tiple breaks.  When  multiple  breaks  exist  and  only  one  is  allowed  in  estimation,  the 
error  variance  cannot  be  consistently  estimated  (because  of  the  inconsistency  of  the 
regression  parameters)  and  is  biased  upward.  This  decreases  the  power  of  the  test. 
It  is  thus  less  likely  to  reject  parameter  constancy.  This  also  explains  partially  why 
the  conventional  supF  test  may  possess  less  power  than  the  test  proposed  by  Bai  and 
Perron  (1994)  in  the  presence  of  multiple  breaks. 

The  problem  may  be  overcome  by  using  a  two-step  procedure.  In  the  first  step, 
the  goal  is  to  obtain  a  consistent  (or  less  biased)  estimate  for  the  error  variance.  This 
can  be  achieved  by  allowing  more  breaks  (solely  for  the  purpose  of  constructing  error 
variance).  It  is  evident  that  as  long  as  m  >  m0,  the  error  variance  will  be  consistently 
estimated.  Obviously,  one  does  not  know  whether  m  >  mo,  but  the  specification  of  m 
in  this  stage  is  not  as  important  as  in  the  final  model  estimation.  When  m  is  fixed, 
the  m  break  points  can  be  either  selected  by  simultaneous  estimation  or  by  the  "one 
additional  break"  sequential  procedure  described  in  Bai  and  Perron  (1994)  (no  test  is 
performed).  In  the  second  step,  the  number  of  breaks  is  determined  by  the  sequential 
procedure  coupled  with  hypothesis  testing.  The  test  statistics  use  the  error  variance 
estimator  (as  the  denominator)  obtained  in  the  first  step. 
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7.  Fine  Tuning:  Repartition 

Although  each  estimated  break  point  is  T  consistent,  there  is  a  tendency  to  over  or 
underestimate  the  location  of  the  breaks  depending  on  whether  A,  is  positive  or  nega- 
tive. We  now  discuss  a  procedure  that  yields  an  estimator  having  the  same  asymptotic 
distribution  as  the  simultaneous  estimators.  We  call  the  procedure  repartition.  The 
idea  of  repartition  is  simple  and  was  first  introduced  in  Bai  (1994b)  in  an  empirical 
application.  This  paper  provides  the  theoretical  basis  for  doing  so.  Suppose  there 
are  m  breaks  and  initial  T  consistent  estimators  kh  (h  =  l,...,m)  are  obtained.  The 
repartition  technique  reestimates  each  of  the  break  points  based  on  the  initial  esti- 
mates.   To  estimate  k°,  the  subsample  [fcj_i,  &i+i]  is  used.   We  denote  the  resulting 

*  »  *  ft 

estimator  by  k*.  Because  of  the  proximity  of  kh  to  k%,  we  effectively  use  the  sample 
[Ar°_!  +  l,fc°+1]  to  estimate  k°.  Consequently,  k*  is  also  T  consistent  for  fc°,  with  a 
limiting  distribution  identical  to  what  it  would  be  for  a  single  break  point  model  (or 
for  a  model  with  multiple  breaks  estimated  by  the  simultaneous  method,  see  Bai  and 
Perron  (1994)).  In  summary: 

Proposition  9.  Under  model  (8)  and  assumptions  A1-A2,  the  repartition  estimators 
satisfy:  for  each  t  >  0,  there  exists  an  M  <  oo,  such  that  for  all  large  T , 

P{\k*-k°\>M)<t,       (i  =  l,2,...,m) 

and  under  the  additional  assumption  A5, 

k*  -  k°  -±+  argmin,W(,)(^,0),       (i  =  l,2,...,m) 

Note  that  assumption  A6  is  not  required.  The  proposition  only  uses  the  fact 
that  the  initial  estimators  are  T  consistent.  As  is  known  in  Section  4,  T  consistent 
estimators  can  be  obtained  regardless  of  the  validity  of  A6  (or  A4). 

It  is  evident  that  the  repartition  method  is  straightforward  to  implement.  Reparti- 
tion requires  an  additional  T  least  squares  computations.  All  together,  no  more  than 
(m  +  \)T  least  squares  are  necessary  to  obtain  break  point  estimators  that  have  the 
same  limiting  distributions  as  those  obtained  by  simultaneous  estimation. 

8.  Small  Shifts 

The  limiting  distributions  derived  earlier,  though  of  theoretical  interest,  are  perhaps 
of  limited  practical  use  because  the  distribution  of  argmin^W^'^,  A)  depends  on  that 

17 


of  Xt  and  is  difficult  to  obtain.  An  alternative  strategy  is  to  consider  small  shifts  in 
which  the  magnitude  of  shifts  converges  to  zero  as  the  sample  size  increases  to  infinity. 
The  limiting  distributions  under  this  setup  are  invariant  to  the  distribution  of  Xt  and 
remain  adequate  even  for  moderate  shifts.  The  result  will  be  useful  for  constructing 
confidence  intervals  for  the  break  points. 

For  concreteness  and  ease  of  exposition,  we  consider  the  two-break  model  of  Section 
2.  This  also  enables  us  to  deliver  a  full  proof  of  our  results  without  much  additional 
effort.  We  assume  that  the  mean  /xt)x  for  the  ith  regime  can  be  written  as  /i.-.j  =  vxfii 
(z  =  1,2,3).  We  further  assume 
Assumption  Bl.  The  sequence  of  numbers  vt  satisfies 

vT  -»  0,     r(1/2)_Vr  ->  oo       for  some  8  G  (0, 1/2)  (11) 

Because  vj  converges  to  zero,  the  function  U(r)  defined  in  section  2  will  be  a 
constant  function  for  all  r.  This  can  be  seen  from  (4)  and  (5),  with  Uj  interpreted  as 
vtP-j-  Therefore,  assumption  A4  is  no  longer  appropriate.  The  correct  condition  for  f 
to  be  consistent  for  rf  is 
Assumption  B2. 

plim  vj2[UT(*$/T)  -  UT(k°/T)]  <  0. 

This  condition  turns  out  to  be  equivalent  to  (6),  with  /j,j  replaced  by  jij. 

Under  Bl  and  B2,  we  shall  argue  that  f  is  consistent  for  r".  However,  the  conver- 
gence rate  is  slower  than  T,  which  is  expected  because  it  is  more  difficult  to  discern 
small  shifts. 

Proposition  10.  Under  assumptions  A1-A3  and  B1-B2  we  have  Tv^{t  —  t°)  =  Op(l) 
or,  equivalently,  for  every  t  >  0,  there  exists  an  M  <  oo  such  that 


p{T\(t-t?)\>Mv?) 


<  e. 


The  proof  of  this  proposition  is  again  based  on  some  preliminary  results  analogous 
to  Lemma  2  and  3.  First  we  modified  the  objective  function  as 

sT(k)-j:xi 
t=i 

This  does  not  change  the  problem,  as  the  second  term  is  free  from  k. 
Lemma  9.   Under  the  assumptions  of  Proposition  10,  we  have 
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(a) 


sup   \uT(k/T)  -  EUT(k/T)  -  T-1  J2(X?  ~  EX})\  =  Op(T-1/2vT). 


(b)  There  exists  C\  >  0,  only  depending  on  rf  and  jlj  (i  =  1,2,  j  =  1,2,3)  such  that 

EST(k)  -  EST(k°)  >  dv2T\k  -  k°\     for  all  large  T. 
Corollary  1.   Under  the  assumptions  of  Proposition  10, 

Proof:  Adding  and  subtracting  Y%=\(X-t  —  EX2)  to  the  following  identity 
ST(k)  -  ST(k°)  =  ST(k)  -  EST{k)  -  [ST(k°)  -  ESt(%)]  +  EST(k)  -  EST(k°) 
to  obtain 

ST(k)  -  ST(k°) 

>  -2  sup  ISTW-ESTW-YXXf-EXH  +  ESTW-ESTik") 

i<i<r'  t=i  ' 

>  -2  sup    ST(j)-ESTU)-52(X?-EX?)\  +  Civ$\k-k% 

i<i<T  t=i  ' 

where  the  second  inequality  follows  from  Lemma  9(b).  From  Sx(k)  —  Srik®)  <  0,  we 
have 

I*  -  *?|  <  C^2vj2  sup    ST(j)  ~  EST(j)  ~  J2(xt  ~  EX2)\. 
i<i<T  t=1  ' 

The  corollary  is  obtained  upon  dividing  the  above  inequality  by  T  on  both  sides  and 
using  Lemma  9(a).        □ 

Because  y/Tvi  — ►  oo,  f  is  consistent  for  r°.  Using  this  initial  consistency,  the  rate 
of  convergence  stated  in  Proposition  10  can  be  proved.  In  view  of  the  anticipated  rate 
of  convergence,  we  define  D^M  the  same  as  Dtm  but  replacing  M  by  Mv^2.  Thus 
for  k  6  Dq*M,  it  is  possible  for  k  —  k°  to  converge  to  infinity  because  v^2  converges  to 
infinity,  although  at  a  much  slower  rate  than  T. 

Lemma  10.   Under  assumptions  of  Proposition  10,  for  every  e  >  0,  there  exists  an 
M  >  0  such  that 

P  (^  ST(k)  -  5r(fc?)  <  o)  <  e. 
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Proof  of  Proposition  10.  The  proof  is  virtually  identical  to  that  of  Proposition  2, 
but  one  uses  Lemma  10  instead  of  Lemma  4.         □. 

Having  obtained  the  rate  of  convergence,  we  examine  the  local  behavior  of  the  ob- 
jective function  in  appropriate  neighborhoods  of  k°  to  obtain  the  limiting  distribution. 
Let  Bi(s)  (i  =  1,2)  be  two  independent  Brownian  motions  on  [0,oo)  with  i?,(0)  =  0 
and  define  a  two-sided  drifted  Brownian  motion  on  TZ  as 

,  /  2B1(-s)  +  \s\(l  +  X)     iis<0 

/Hs,Aj-|  2B2(s)  +  \s\(l-X)        if  s  >0 

with  A(0,A)  =  0. 

Proposition  11.    Under  the  assumptions  of  Proposition  10, 

T(fj,2T  -  (j.it)2(t  ~  if)  — ►  a(l)V£2argminsA(s,  Ax) 

where  Ax  is  defined  in  Proposition  4  with  p.j  replaced  by  jij. 

While  the  density  function  of  argminJA(s,  Ai)  is  derived  in  Bai  (1994b)  so  that  confi- 
dence intervals  can  be  constructed,  it  is  suggested  that  the  repartitioned  estimators  be 
used.  For  the  repartitioned  estimator,  the  limiting  distribution  corresponds  to  Ai  =  0. 

9.      Further  Extensions 

The  preceding  discussion  has  focused  on  multiple  mean  shifts  in  linear  processes.  The 

whole  procedure  can  be  elaborated  to  multiple  regressions  that  are  more  useful  in 

econometric  applications.  Here  we  give  conditions  that  ensure  T  consistency.  These 

conditions  are  similar  to  those  in  Bai  (1994b)  and  Bai  and  Perron  (1994). 

Consider 

Yt  =  Zt6i  +  Xt,         t  =  1,2, ...,«!, 
Yt  =  Z't82  +  Xt,        t  =  fcx  +  1,...,  k2, 

Yt  =  Zl6m+1+Xu  t  =  *o+i,...,r, 

where  Yt  as  before  is  the  observed  dependent  variable  at  time  t;  Zt  (q  x  1)  is  a  vector 
of  covariates;  Sj  (J  =  l,...,m  +  1)  are  the  corresponding  vectors  of  coefficients  with 
Si  ^  8{+i  (i  =  l,...,m);  Xt  is  a  linear  process  satisfying  A1-A2. 
Assumption  Cl:  The  regressors  satisfy: 

i    [Tv] 

plim-J2ZtZ't  =  Q(v) 

1  t=\ 
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uniformly  in  v  €  [0, 1],  where  Q(v)  is  a  positive  definite  matrix  for  each  v  >  0  and 
Q{v)  -  Q(u)  >0forv>u. 

Assumption  C2:   For  large  £,  the  minimum  eigenvalues  of  \  IZfco+1  ZtZ[  and  of 

t  ^k°-t  ZtZ't  are  bounded  away  from  zero  [i  =  1, ...,  m  +  1). 

Assumption  C3:  The  disturbances  {Xt}  satisfy  one  of  the  following  alternatives: 

a)  {Xt,Tt}  forms  a  sequence  of  martingale  differences  where  Ft  =  a  —  field 
{Za+1,Xs;s  <  t}  with  supf  E\Xt\4+s  <  oo. 

b)  Xt  is  independent  of  Za  for  all  t  and  all  s,  but  {Xt}  forms  a  sequence  of 
mixingales  satisfying  conditions  given  in  Bai  and  Perron  (1994). 

Assumption  C4:  fc?  =  [Tt?],  t?  €  (0, 1)  with  t?  <  rt+1  (i  =  1, ...,  m). 

Assumption  Cl  is  satisfied  by  i.i.d.  regressors  having  a  finite  variance.  It  is  also 
satisfied  by  any  second  order  stationarity  process  such  that  the  strong  law  of  large 
numbers  holds  for  ZtZ[.  In  these  cases,  Q(v)  =  vQ,  where  Q  =  EZtZ't.  Trending 
regressors  also  satisfy  Cl.  More  interestingly,  it  is  satisfied  by  autoregressive  models 
with  breaks.  Suppose  Zt  =  (1,  Yt-\, ...,  Yt-q-\).  Although  Zt  is  not  globally  stationary, 
its  adjustment  to  its  new  stationary  path  is  very  quick  after  a  break  takes  place.  Thus 
for  segment  t,  the  limit  plim^r  Sfco+1"  i  ZtZ't  converges  to  vQ,-,  where  A&,  =  fc°+1  — 
k°  and  Qi  is  the  second  moment  matrix  of  a  stationary  autoregressive  process  with 
autoregressive  parameters  £,-.  Therefore,  Q{v)  =  t°QiH KT°~ rt-i)Qt+(v~ T?)Qt+i 

i0TVe[T?,T?+1]. 

Assumption  C2  requires  that  there  be  sufficient  data  near  the  break  point.  It  is 
used  for  T  consistency.  Part  (a)  of  Assumption  C3  allows  for  autoregressive  models  or 
models  with  lagged  dependent  variables.  Part  (b)  allows  for  general  serial  correlated 
disturbances.  A  mixingale  sequence  includes  many  dependent  processes  as  special 
cases.  These  assumptions  are  similar  to  those  of  Bai  (1994b)  and  Bai  and  Perron 
(1994). 

Under  the  assumptions  C1-C4,  using  the  argument  presented  earlier  in  this  paper 
together  with  that  of  Bai  (1994b),  it  can  be  shown  that  the  sequential  estimators 
are  T  consistent.  The  repartitioned  estimators  have  limiting  distributions  identical 
to  simultaneous  estimators.  Therefore,  confidence  intervals  can  be  constructed  in  the 
way  given  in  Bai  (1994b)  and  Bai  and  Perron  (1994). 

10.      Some  Simulated  Results 

This  section  reports  results  from  some  Monte  Carlo  simulations.  The  data  are  gener- 
ated according  to  a  model  with  three  mean  breaks.  Let  (fii , ...,  fi4)  denote  the  mean  pa- 
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rameters  and  (k° ,  k®,  fc°)  denote  the  break  points.  We  consider  two  sets  of  mean  param- 
eters. The  first  set  is  given  by  (1.0,2.0, 1.0,0.0),  and  the  second  by  (1.0,2.0,  -1.0, 1.0). 
The  sample  size  T  is  taken  to  be  160  with  break  points  at  (40,80,120)  for  both  sets 
of  mean  parameters.  The  disturbances  {Xt}  are  i.i.d.  standard  normal.  All  reported 
results  are  based  on  5000  repetitions. 

First  we  assume  the  number  of  break  points  is  known  and  focus  on  their  estimation. 
The  break  points  are  chosen  using  the  suggestion  of  Bai  and  Perron  (1994),  "one  and 
only  one  additional  break"  in  each  round.  A  chosen  break  point  must  achieve  greatest 
reduction  in  total  sum  of  squared  residuals  for  that  round  of  estimation. 

Figure  1  displays  the  estimated  break  points  for  the  first  set  of  parameters  [called 
model  (I)].  To  verify  the  theory  and  for  comparison  purposes,  three  different  methods 
are  used-  sequential,  repartition,  and  simultaneous  methods.  Because,  for  model 
(I),  the  magnitude  of  shift  for  each  break  is  the  same,  we  expect  three  estimated 
break  points  should  have  a  similar  distribution  for  the  repartition  and  simultaneous 
methods.  This  is  indeed  so,  as  suggested  by  the  histograms.  For  sequential  estimation, 
the  distribution  of  the  estimated  break  points  shows  asymmetry,  as  suggested  by  the 
theory.  This  asymmetry  is  removed  by  the  repartition  procedure. 

Figure  2  displays  the  corresponding  results  for  the  second  set  of  parameters  [model 
(II)].  Because  the  middle  break  has  the  largest  magnitude  of  shift,  it  is  estimated 
with  the  highest  precision,  then  followed  by  the  third,  and  then  by  the  first.  Note 
that  the  sequential  method  picks  up  the  middle  break  point  in  the  first  place.  This 
has  two  implications.  One,  the  first  and  third  estimated  break  points  will  have  the 
same  limiting  distribution  as  simultaneous  estimation,  even  without  repartition.  This 
explains  why  the  results  look  homogeneous  for  the  three  different  methods.  Two, 
Only  the  middle  break  point  will  have  an  asymmetric  distribution  for  the  sequential 
method.  This  asymmetry  is  again  removed  by  repartition. 

These  simulation  results  are  entirely  consistent  with  the  theory.  Also  remarkable 
is  the  match  rate  for  the  repartition  and  simultaneous  methods.  They  yield  almost 
identical  results  in  the  simulation.  The  match  rate  for  model  (I)  is  over  92%,  while 
for  model  (II)  the  rate  is  over  99.5%. 

We  also  perform  some  limited  Monte  Carlo  simulations  for  estimating  the  number 
of  breaks  using  the  sequential  method.  In  addition  to  the  two  sets  of  parameters 
considered  earlier,  we  add  a  third  set  of  parameters,  which  is  (1.0, 2.0, 3.0, 4.0)  [referred 
to  as  model  (III)].  For  comparison  purposes,  estimates  using  the  BIC  method  are  also 
given.  Figure  3  displays  the  estimated  numbers  for  both  methods.  The  left  three 
histograms  (a,  b,  c)  are  for  the  sequential  method  and  the  right  three  (a',  b' ',  c')  are 
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for  the  BIC  method.  The  sequential  method  uses  a  two-step  procedure  described  in 
Section  6.2.  We  assume  the  number  of  breaks  is  4  in  the  first  step  and  estimate  the 
error  variance  based  on  repartitioned  estimators.  The  size  of  the  test  is  chosen  to  be 
0.05  with  the  corresponding  critical  value  9.63. 

For  the  first  set  of  parameters,  the  BIC  criterion  does  a  better  job  than  the  se- 
quential method.  The  latter  underestimates  the  number  of  breaks.  For  a  significant 
proportion  of  observations,  the  sequential  method  only  detects  a  single  break.  We  find 
that  the  single  break  identified  by  the  sequential  method  in  most  cases  is  the  third 
break.  Put  another  way,  the  sequential  method  has  difficulties  in  finding  breaks  if 
the  first  80  observations  are  used.  Indeed,  the  supF  test  has  lower  power  in  detecting 
"hat"  shaped  mean  changes  (especially  for  small  samples  and  less  pronounced  shifts). 
For  the  second  set  of  parameters,  the  two  methods  are  comparable.  Interestingly,  the 
sequential  method  works  better  than  the  BIC  criterion  for  the  third  set  of  parameters. 

The  sequential  method  may  be  improved  upon  in  at  least  two  dimensions.  First, 
the  supF  test  which  is  designed  for  testing  a  single  break  may  be  replaced  by,  or  used 
in  conjunction  with,  Bai  and  Perron's  supF(£)  test  for  testing  multiple  breaks.  The 
latter  test  has  better  power  in  the  presence  of  multiple  breaks.  Other  tests  such  as  the 
exponential  type  or  average  type  tests  can  also  be  used;  see  Andrews  and  Ploberger 
(1994).  Second,  the  critical  values  may  be  chosen  using  small  sample  distributions 
rather  than  limiting  distributions.  There  are  certain  degrees  of  flexibility  in  the  choice 
of  sizes  as  well.  In  any  case,  the  sequential  procedure  seems  promising.  Further 
investigation  is  warranted. 

11.      Summary 

We  have  developed  some  underlying  theory  for  estimating  multiple  breaks  one  at 
a  time.  We  proved  that  the  estimated  break  points  are  T  consistent  and  we  also 
derived  their  limiting  distributions.  A  number  of  ideas  have  been  presented  to  analyze 
multiple  local  minima,  to  obtain  estimators  having  the  same  limiting  distribution 
as  those  of  simultaneous  estimation,  and  to  consistently  determine  the  number  of 
breaks  in  the  data.  The  proposed  repartition  method  is  particularly  useful  because  it 
allows  confidence  intervals  to  be  constructed  as  if  simultaneous  estimation  were  used. 
Of  course,  the  repartition  estimators  are  not  necessarily  identical  to  simultaneous 
estimators. 
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Appendix:  Mathematical  Proofs 

Throughout  the  proof,  the  notation  op(l)  [0P(1)]  is  used  to  denote  a  sequence  of 
random  variables  converging  to  zero  in  probability  [stochastically  bounded].  All  limits 
are  taken  as  the  sample  size  T  converges  to  infinity,  unless  stated  otherwise.  We  may 
write  X  =  Y  when  X  and  Y  having  the  same  distribution. 

The  first  two  lemmas  in  the  text  are  closely  related.  We  first  derive  some  results 
common  to  these  two  lemmas.  We  need  to  examine  Ur(k)  for  all  k  €  [1,  T\. 

For  k  <  JbJ, 

K   t=l 

—  1  \~^     \r  1  2  1  2  \~~^      v 

Y"  =  T^k  j£x  '  =  r^1  +  T^kfl2  +  T^k*13  +  ¥^k J?+1  Xt- 

Throughout,  we  define  Ark  and  Ajk  as 

1    k  1         T 

^*=lE^''  ATk  -  T  _  7     E    Xt. 

K  t=l  IK  t=k+1 

E(Xt  -  ^)2  =  J2(xt  - 1 E  x{f  =  j:(xt  -  ATkf  (is) 

t=i  t=i       K  »=i        t=i 


Thus 


and 


E  («  -  n*)2 

t=k+l 

k°  k°  T 

=  ±  (Ml  +  xt - y;Y  +J2(n2  +  xt-  y;Y  +  e  (^  +  x,  -  y;Y 

t=k+l  JtJ+1  fco+l 

it0 

=    E  It^-t{(T  -  fc?)(/i!  -  W)  +  (T  -  fc2°)(/.2  -  *,)}  +  Xt  -  A'Tk? 
t=k+l  1      K 

+  E  l^K*?  -  *)te  -  mi)  +  (r  -  *»°)to  -  M3)}  +  x,  -  a^]2 

+   E  k^tt*?  -  *)(/*  "  Ml)  +  (*2  "  *)(A*3  -  /*)}  +  *  -  >1^]2- 

The  latter  expression  can  be  rewritten  as 

E  W  -  n*)2    =    (*f  -  *)4*  +  2ar*  E  (X,  -  A*Tk) 

t=k+l  t=k+\ 
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where 


Rewrite 


and 


ark  = 
brk  = 
CTk  = 


T-k 

1 
T-k 

1 


+(A:20  -  ^)fc^  +  2ferfc  J]  (Ai  -  A^) 

+(T  -  *2°)4*  +  2cTk  J2  (Xt  -  A'Tk) 

k°+i 

+   J2(Xt-  A'Tkf 


{(T  -  k*)^  -  n)  +  (T-  k°2)(to  -  fi3)} 
{(*?  -  k)(fi2  -  #n)  +  (T  -  k°2)(to  ~  Us)} 
{(A?  -  k)(to  -  in)  +  (k°2  -  k)(to  -  i*)}. 


?  E  (xt  -  A-Tkr  =  1  £  x\  -  ^=A(A^y 

1  t=k+l  x  t=k+l  J 


fD*-W-^-^g*f. 


(14) 


(15) 


(16) 


Combining  (13)  and  (14)  and  using  (15)  and  (16),  we  have  for  k  <  fc°, 


where 


UT(k/T)  =  -ST(k) 


aTk  + 


bTk  + 


1  l  t=\ 


RMk)   =   it 


fc? 


2aT*   E  ^t  +  26rfc  E  *t  +  2c™  E  * 


*°+l 


2 


(*°  -  k)aTk  +  (k°2  -  k°)brk  +  (T-  k°2)cTk 


Ark 


We  shall  argue  that 

R1T(k)  =  Op{T-1/2)        uniformly  in  fc  €  [1,  k°). 


(17) 


(18) 


(19) 
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Note  that  ajkibrk,  and  ^Tk  are  uniformly  bounded  in  T  and  in  k  <  [Tt°]  and  Ajk  = 
Ov(T~ll2),  it  is  easy  to  see  that  first  two  expression  on  the  right  hand  side  of  (18) 
are  Op(T-^2)  uniformly  in  k  <  [7Y°].  The  second  to  the  last  term  is  T-xOp{\og2T) 
because  sup1<fc<T  \-jzJ2i=i  Xi\  =  Op(\°zT).  Finally,  the  last  term  is  Op(T-1)  because 
A^.k  =  Op{T~^2)  uniformly  in  k  <  k°.  This  gives  RlT{k)  =  Op{T~l12)  uniformly  in 
k<k°. 

Next  consider  k  €  [k°  +  1,  k°].  We  have 

>it  =  y  Mi  +  ~ Jp"M*  +  ATfc, 


it0  -  A- 
v«  _  2? 


M2  + 


r-fc° 


Ma  +  Aj.fc. 


Thus 


X  -  n  =  { 


Yt  -y:  =  { 


^.(fll-M2)  +  Xt-ATk     ift€[l,*a 
%(H2  -  m)  +  Xt  -  ATk        iite[k°  +  l,k] 
'  ^■(fX2-fl3)  +  Xt-A'Tk     tite[k  +  l,k°] 


T-k 
k°-k 


.  ££(;*  -  Ma)  +  *t  -  ^r*      if  *  €  [t§  +  1,  T]. 
Hence  for  k  G  [Jfc?  -J- 1,  fc°], 

E(^  -  ^)2 

=  *°4fc  +  2<fr*  £(Xt  -  ATk)  +  Jt(X<  ~  ATkf 


t=\ 


t=\ 


t=\ 


+(k-k°1)e2Tk  +  2eTk    E   (Xt-ATk)+    E   (X,  -  ATfc)2 
t=jt°+i  t=*?+i 

where  drfc  =  -T_L(/ii  —  M2)  and  e™  =  -}J-(M2  -  Mi),  and, 


(20) 


E  re  -  y;y 


t=k+l 


*z 


*§ 


=  (k°2  -  k)fTk  +  2fTk  E(*<  -  Ark)  +  E(^  "  ATkf 

fc+i  fc+i 

+(T  -  k°2)g2Tk  +  2gTk  E  (Xt  -  A*Tk)  +  E  {Xt  -  A*Tkf 

fc§+l  fc°+l 


(21) 
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where  fTk  =  yzjt"(^2  -  ^3)  and  gTk  =  -fzki^  -  Pt)-  Therefore, 
UT(k/T)  =  ^ST(k) 


k°  k  -  k°  k°  -  k  T  -k°  1     T 

=  f4k  +  -Y±4k  +  ^Afk  +  t-jF-A*  +  fY,*?  +  RMQ      (22) 


t=i 


kx{k      kt)  2      (fc2  -  k)(T  -  k2)  2  ,    1  v-"  v2  _l  R     m 

-   ^ (/*2-A*l)     +  JY?1-^ (^3-/^2)     +  y  2^  A«    +  H-2T{k) 


where 

fc? 


JM*)    =    ^ 


2<frjt£jr,  +  2eTib    £    ^t  +  2/rfcE^  +  2^E^ 

<=1  t=k°+l  fc+1  fc°+l 


2^ 


%dTk  +  (k-  k°)eTk  +  (k°2  -  k)fTk  +  (T-  k°2)gTk 


A'Tk      (23) 


~^{ATkf  -  ^jA(A±ky. 


Using  the  uniform  boundedness  of  dxki^Thfrk  and  gjk  as  well  as  Ark  =  Op(T~ll2) 
and  Ajk  =  0P(T,_1/2)  uniformly  in  A;  €  [fc°  +  1,  k2],  we  can  easily  show  that  R.2T(k)  = 
0p(r-1/2)  uniformly  in  k  €  \k®  +  1,  fc°].  As  for  k  >  k2,  using  the  symmetry  with  the 
first  regime,  we  have 

Ur(k/T)  =  |/4*  +  ^Ap2Tk  +  ^&  +  ff.  X\  +  R3T(k)  (24) 

where,  similar  to  before,  R3r(k)  —  Op{T~xl2)  uniformly  for  k  €  [k2,  T]  and 

hrk    =     r[(k  -  k°){fii  -  fi2)  +  (k-  k2){n2  -  ^3)] 

PTk    =     -j-[k°(fi2  -  Hi)  +  (k  -  k°)(fi2  -  ft3)] 

qrk    =    £[*?(/*2  -  A*i)  +  *a0*3  -  M2)]. 

Proof  of  Lemma  1.  Because  axib&Tfci  •••j'Zxfc  all  have  uniform  limits  for  k  =  [TV] 
and  the  stochastic  terms  in  (17),  (22),  and  (24)  all  have  uniform  limits  in  pertinent 
regions  for  r  €  [0, 1],  The  uniform  convergence  of  Ut{t)  follows  easily.  The  uniform 
limit  of  Ut(t)  is  also  easy  to  obtain.  Note  that  (4)  and  (5)  are  obtained,  respectively, 
by  taking  k  =  k°  and  k  =  k2  in  (22)  and  letting  T  — »  00. 

Proof  of  Lemma  2.  The  only  stochastic  terms  in  (17),  (22),  and  (24)  are  Rir(k) 
(i  =  1,2,3).     Each  of  which  is  Op(T~1^2)  uniformly  over  pertinent  regions  for  k. 
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Furthermore,  it  is  easy  to  see  that  EFUrik)  =  0(T-1)  uniformly  in  k  (i  =  1,2,3). 
These  results  imply  Lemma  2. 

To  prove  Lemma  3,  we  need  additional  results. 

Lemma  11.    There  exists  an  M  <  oo  such  that  for  all  i  and  all  j  >  i, 

\E{(±Xt)(±  X,)}\<M. 
t=i        *=t+i 

Proof:   Let  f{h)  =  E(XtXt+h).   Then  under  assumptions  A1-A2,  it  is  easy  to  argue 
that  Er=i  h\~f(h)\  <  oo-  Now 

i£(X»(i;  *.)i=ie  e  7(«-oi<e*I7(*)i<i;ai7(*)i<oo. 

t=l  »=t'+l  t=l  s=i+l  /i=l  h=l 

a 

We  will  also  use  the  following  result:  there  exists  an  M  <  oo,  such  that  for  arbitrary 
i  <  j, 

E^ZT-i  E  X^  <  M.  (25) 

•?      z  t=t+i 

In  the  sequel,  we  shall  use  ark  and  ar(k)  interchangeably.   Similar  notations  are 

also  adopted  for  Ark,  Ajk  as  well  as  for  brk,CTk,--- 

Lemma  12.   Under  A1-A3,  there  exists  an  M  <  oo  such  that 

T\ERlT(k)  -  ER1T(%)\  <  ^V^M. 


Proof:  The  expected  value  of  the  first  two  terms  on  the  right  hand  side  of  (18)  is  zero. 
We  thus  need  to  consider  the  last  two  terms.  For  k  <  fc°, 

1     k  1     fc? 

K  t=i  rci  t=i 

=  (T  "  To)(E^)2  -  2p(E^)(  E  *.)  -  i(  E  *02  (26) 

K      Kl    t=l  Kl  t=l        t=fc+l  Kl  t=fc+l 

fc°  —  A- 1     k  1      *  k°  k°  —  k      1  *° 

=  ^t(E*?-24c*x  £  *)-  V%.-*<  E  *»)' 

Kl  K     t=l  Kl      t=l  tzzk+l  Kl  Kl  K     t=fc+l 

Apply  Lemma  11  to  the  second  term  above  and  apply  (25)  to  the  first  term  and  the 
third  term  above,  we  see  that  the  absolute  value  of  the  expectation  of  (26)  is  bounded 
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by  M\k°  -  k\/T.  This  result  holds  for  k  >  k°  (only  need  to  use  £JL,  =  £*!,  +  EZk+i 
in  the  proof).  By  symmetry, 

|(T  -  k)E{A'T{k)f  -  (T  -  kl)E{A^{kl)f\  <  M\k°  -  k\/T 

Combining  these  results,  we  obtain  Lemma  12.        □ 

Note  that  the  expected  values  of  Rjr(k)  for  j  =  1, 2, 3  have  an  identical  expression 
as  functions  of  k.  We  thus  have 

T\ERiT(k)  -  ERiTik0,)]  <  !^^M,      (i  =  1,2,3)  (27) 

Proof  of  Lemma  3.  For  k  <  k°,  using  (17)  with  some  algebra  to  obtain 

EST(k)  -  EST(k°) 
=    (k*  -  k)aT(kf  +  (k°2  -  k^brikf  -  6r(fci)2]  +  (T  -  k°2)[cT(kY  -  cT(fc?)2] 
+T{ERlT(k)  -  ERrrik0,)} 

=  (i-fc/r)("i-^/r)t(1 "  k°JT){(il  ~  ^  +  (1 "  k°JT){fJL2 "  ^      (28) 

+T{ERlT(k)-ERlT(k°1)}. 
Because  \(k$/T)  -  rf|  <  T"1  (i  =  1,2), 
EST(k)  -  EST(k°) 

=  (i  -  fc/r)(i  -  q/T)  l(1 "  T°){fl1 "  ^  +  (1 "  T2°)(/i2 "  ^ 

+0($y±)  +  T{JB/Jir(*)  -  ER1T(k°)}. 
We  claim  that  when  U(t°)  <  U(t°), 

C  =  (1  -  Tf )(^  -  ft)  +  (1  -  r°)(ft  -  ft)  #  0.  (29) 

Condition  £/(r°)  <  U(t%)  is  equivalent  to 

1-T° 


f-^(ft  -  ft)2  <  3r(ft  -  /^)5 


Multiplying  (1  —  t$ )(1  —  t°)  on  both  sides  of  above  and  using  (1  —  t°)t^/t°  <  1  —  t° 
to  obtain 

(1  -  T°)2(ft  -  ft)2  <  (1  -  T°)2(ft  -  ft)2 
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This  verifies  (29).  Together  with  Lemma  12,  we  have 

EST(k)  -  ESrik*)  >  (Jfc°  -  k)C2  -  0(^y^)  >  (Jfc°  -  k)C2/2  (30) 

for  all  large  T. 

Remark  1  (a)  Regardless  of  the  validity  of  Assumption  A4,  by  (28)  and  Lemma  12, 

for  k  <  k° 

EST(k)  -  EST(k°)  >  T{ERlT(k)  -  ER1T(k°)}  >  -M\k\  -  k\/T. 

By  symmetry  (which  can  be  thought  of  as  reversing  the  data  order),  for  k  >  k°, 

EST(k)  -  EST{k°2)  >  T{ER3T(k)  -  ER3T(k°)}  >  -M\k°  -  k\/T. 

This  property  will  be  used  later. 

(b)  Even  if  the  strict  inequality  in  Assumption  A4  is  replaced  by  an  equality,  i.e. 
£/(r°)  =  ^/(r"),  the  previous  proof  shows  that  Lemma  3  still  holds  for  k  €  [1,  k°].  The 
strict  inequality  is  only  needed  for  k  €  [fc°  +  1,  k%)],  which  is  considered  below.        □ 

For  k  €  [k°  +  1,  A;")],  use  the  last  equality  of  (22)  with  some  algebra, 


EST{k)  -  EST{k\) 
=    {k-k\) 

+T{ER2T(k)  -  ER2T(k°1)}. 
Factor  out  k°/k,  and  use  k(T  -  k°)/{k%(T  -  k)}  <  1,  for  all  it  <  jfc°, 


"h   /  \2  V-^  2/  /  \1 

T^-^]   -(T-k)(T-kO)^-^ 


(31) 


EST(k)  -  EST(k°) 


>  (k-k°4 


%2-^f-^-^M-^f 


k  [k^    rly     (r-jfc?) 

+T{ER2T(k)-ER2T(k°1)}. 
Denote  C*  =  (t°/t2°)(//2  -  fn)2  -  [(1  -  r2°)/(l  -  T°))(fi3  -  fi2)2.  By  (6),  C  >  0.  From 

(32) 

K2  T2  J  Kf  1         Tj" 

we  have 


it0        r° 

|  -  Tfc  =  O(T-), 

rc2          '2 

T  —  k°        1  -  t° 

i          *2          X        ^    =0(T-1) 

EST(k)  -  EST(k°) 

>    (k-k^C 

-(fc  -  A:?)0(T-1)  +  T{ER2T(k)  -  ER^k")} 
>    (k  -  k°)C*  -  M^- 
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for  some  M  <  oo  by  Lemma  12.  Thus  for  large  T, 

EST{k)  -  EST{k°)  >  (Jfc  -  k°)Cm/2.  (33) 

It  remains  to  consider  k  €  [Jk°  +  l,?1].  From  Remark  1(a),  EST(k)  -  EST{k°)  > 
-T\ER3T(k)  -  ER3T(k°)\  >  -(jfc  -  k°)M/T  >  -(T  -  Jfc°)M/T.  Thus 

EST(k)  -  ESt(^)  =  EST{k)  -  EST(k°)  +  EST{k°)  -  EST(k°) 

>    EST{k°)  -  EST(%)  -  (T  -  k°)M/T 

EST(k°)  -  EST{k°)      M" 


T  -  k° 


T-k°[  T-k°  T 

the  last  inequality  follows  from  (k  —  k°)/(T  —  k°)  <  1.  Using  (33)  with  k  =  k°,  we  see 
that  the  term  in  the  bracket  is  no  smaller  than   f— i  C*/4  for  large  T.  Thus 


'2 

rO  ,-0 


EST(k)  -  EST(%)  >(k-  k^^-^-C'/S  (34) 

for  all  large  T.  Combining  (30),  (33),  and  (34),  we  obtain  Lemma  3.         □ 
Proof  of  Lemma  4.  Rewrite 

ST(k)  -  ST(k°)  =  ST(k)  -  EST(k)  -  [ST(k°)  -  EST(k°)]  +  EST(k)  -  EST(k°). 

From  Lemma  3,  Sr(k)  —  Sr(&i)  <  0  implies  that 

ST(k)  -  EST(k)  -  [Srffl  -  EST(k°)]  <     r 
\k-kl\  -     °" 

This  further  implies  that  the  absolute  value  of  the  left  hand  side  of  the  above  is  at 
least  as  large  as  C.  We  show  this  is  unlikely  for  k  £  Dj,m-  More  specifically,  for  every 
e  >  0  and  -q  >  0,  there  exists  an  M  >  0  such  that  for  all  large  T, 


P[    sup 

\*€DT|M 


ST(k)  -  EST(k)  -  {ST(k°)  -  EST(k°)} 


\k  -  fc°| 


>  r/     <  e. 


First  note  that 


\ST(k)  -  EST(k)  -  {Srik0,)  -  EST(%)}\ 
=    \T{R1T(k)  -  ER1T(k)}  -  TiRMtf)  -  ER1T(k°)}\ 
<    \T{R1T(k)-R1T(k°l)}\  +  M'\k-k01\/T 
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for  some  M'  <  oo  by  Lemma  12.  Thus  it  suffices  to  show 

>t))<t. 


P[    sup 


T{RlT(k)-R1T(k°1)} 


(35) 


We  consider  the  case  k  <  k®.  From  (18), 


T{R1T(k)  -  R1T(k°)} 

kl  *2  T 

=    2(aTk  £  Xt)  +  2({brk  -  6r(*?)}  f)  *f)  +  2({cTfc  -  <*(*?)}  £  Xt) 

-(*?  -  *)«r*Ar*  -  (fc°  -  fc?){&r*^  -  M*?Mt(*?)*}  (36) 

-(T  -  AS){orfcA5.fc  -  cr^VrC*?)} 

+[4(E^)2  ~  i(^x02]  +  [(r  -  *?)Mr(*?))a  -  (r  -  k)(A*Tky] 


l*° 


t=i 


t=i 


we  shall  show  that  each  term  on  the  right  hand  side  divided  by  k®  —  k  is  arbitrarily 
small  in  probability  as  long  as  M  is  large  and  T  is  large.  Because  ajk,  &Tfc,  Or*  are  all 
uniformly  bounded,  with  an  upper  bound  say,  L,  the  first  term  divided  by  k°  —  k  is 

k° 

bounded  by  L\-^^  J2k+i  Xt\,  which  is  uniformly  small  in  k  <  k°  —  M  for  large  M  by 
the  strong  law  of  large  numbers.  For  the  rest  of  terms,  we  will  use  the  following  easily 
verifiable  facts: 


|6r*-M*i°)l< 
\cTk  -  cr(*?)|  < 


rC-i  K 


T-k 


C, 


T-k 


C, 


for  some  C  <  oo,  and 


T\    1 )  "~~  **-T\    )  "~~ 


Art  /C 


T 


fc? 


E* 


(r_fc)(r-fc?)^       r-t^t 


(37) 
(38) 

(39) 


In  view  of  (37),  the  second  term  on  the  right  hand  side  of  (36)  divided  by  k°  —  k  is 
bounded  by 


I  *2 

tI 


fc° 


1 


kl 


c^Zx^c^i^Zx^c^tx,, 


11    fc?+l 


fc?+l 


for  some  C"  <  oo,  which  converges  to  zero  in  probability  by  the  law  of  large  numbers 
(note  that  T  —  k  >  T(l  —  r")  for  all  k  (E  Dt).   The  third  term  is  treated  similarly. 
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The  fourth  term  divided  by  k°  —  k  is  bounded  by  X|j4jfc|  =  Op{T~ll2)  uniformly  in 
k  €  Dt-  The  fifth  term  can  be  rewritten  as 

(k°2  -  k°){brk  -  M*?)Mr*  +  (*2°  "  *?)M*?)  W(*?)  -  A'T(k)}  (40) 

Using  (37),  the  first  expression  of  (40)  divided  by  k°  —  k  is  readily  seen  to  be  op(l). 
The  second  expression  divided  by  k°  —  k  is  equal  to,  by  (39) 


k°  -k°  T  k°  -  k°  ,      1       v      k° 

(T  -  k)(T  -  *?)  &     '       T-k  Uf-i- 


Zx'-fcf(linn)  S*  («) 

t=fco  1         K       Kl        K     t=k+l 

with  the  first  term  being  op(l)  and  the  second  term  being  small  for  large  M.  Thus 
the  fifth  term  of  (36))  is  small  if  M  is  large.  The  sixth  term  is  treated  similarly  to 
the  fifth  one.  It  is  also  elementary  to  show  that  the  seventh  term  and  the  eighth  term 
of  (36)  divided  by  k°  —  k  can  be  arbitrarily  small  in  probability  provided  that  M  and 
T  are  large.  This  proves  the  Lemma  4  for  k  <  k°.  The  case  of  k  >  fc°  is  similar;  the 
details  are  omitted.        □. 

Proof  of  Lemma  5.  We  prove  the  first  inequality,  the  second  follows  from  symmetry. 
For  k  <  &i,  the  lemma  is  implied  by  Lemma  3  which  holds  for  U(t°)  =  U(t$),  see 
Remark  1(b).  Next,  consider  for  k  €  [fc°  +  l,k°].  From  (31),  (32),  and  the  condition 
Utf)  =  U(if)  (i.e.  {iffflfa  -  ^  =  [(1  -  r2°)/(l  -  t°)](^3  -  ^)2),  we  have 

EST(k)  -  EST(k°) 

(k°        T  —  k°\  t° 

=  <*-*?>(£-TZ]?)>-*>a  ^ 

+(Jb  -  fc?)0(r-x)  +  T{ER2T(k)  -  ERxrik*)} 
Note  that  for  all  it  <  k^  =  (fc°  +  fc£)/2, 


K2  J-         »2  \ ""2         ""/-*     \  o-l  '     2         *l)-'     ^.    o    2         "'l 


0 


Jt     T-k     k{T-k)-1    k(T-k)  -      r     -2     *: 

The  last  two  terms  of  (42)  on  the  right  hand  side  are  dominated  by  the  first  term. 

The  lemma  is  proved.        □ 

Proof  of  Lemma  6.    It  is  enough  to  prove  the  lemma  for  i  =  1.  The  case  of  i  =  2 

follows  from  symmetry.  The  proof  is  virtually  identical  to  that  of  Lemma  4.  One  uses 

Lemma  5  instead  of  Lemma  3.  The  rest  can  be  copied  here.        □. 

Proof  of  Lemma  7.    We  shall  prove  P(5r(fci)  -  Srifa)  <  0)  — »  1/2  or  equivalently, 

P(T-1/2{5r(fci)  -  ST(k2)}  <  0)  -»  1/2.  Because  jfc,-  =  fc?  +  Op(l),  ST(fc)  =  ST(kf)  + 

Op(l)  (see  the  proof  of  Proposition  4).  It  suffices  therefore  to  prove 

P  (r-1/2{5r(fc1)  -  ST(k°2)}  <  0)  -  1/2. 
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The  equality  of  U(t°)  and  U(t°)  translates  into  an  approximate  equality  of  ESr{k°) 
and  ESr(k%).  More  precisely,  using  \(k°/T)  —  r°\  <  1/T,  it  is  easy  to  show  that 
\EST{kf)  -  TU(t?)\  <  A  for  some  A  <  oo.  This  implies  \EST(k°)  -  EST(k%)\  <  2A. 
Thus 

T-^{ST(k°)  -  ST(k°2)}  =  y/f{R2T(k\)  -  R2T(k%)}  +  0(T^2) 

where  R2T(k?)  =  ^{Sri^)  -  EST(kf)}  {i  =  1,2),  see  (22).  Note  that  we  have  used 
the  fact  that  Sr(k),  when  k  —  k°,  can  be  represented  by  both  (17)  and  (22)  and  we 
have  used  (22).  Consequently,  it  suffices  to  prove 

P(y/f{R2T(k°1)  -  R2T(k°2)}  <  0)  ^  1/2. 

From  (23), 

TWRrrift)  =  2/r(*?)-i=  ±Xt  +  2^)4=  E  Xt  +  Op{T~^). 

The  above  follows  from  {k°-k°)fT{k°)+(T-k°)gT(k°)  =  0  and  (T-k^T-^A^)  = 
Op{T-1'2).  Similarly, 

it0  k° 

T^R2T(k°)  =  2dT(k°2)^=±Xt  +  2er(*°)-L  E  Xt  +  Op(T~^). 

Thus  r1/2{i?2r(^i)  —  -^27(^2)}  converges  in  distribution  to  a  mean  zero  normal  random 
variable  by  the  central  limit  theorem.  The  lemma  follows  because  a  mean  zero  normal 
random  variable  is  symmetric  about  zero.        □ 

Proof  of  Proposition  4.  Consider  the  process  Sr(k°  +  £)  —  Sr(k°)  indexed  by  £, 
where  £  is  an  integer  (positive  or  negative).  Suppose  that  the  minimum  of  this  process 
is  attained  at  £.  By  definition,  £  =  k  —  k®.  By  proposition  2,  for  each  e  >  0,  there  exists 
an  M  <  00  such  that  P(\k  -  k%\  >  M)  =  P(\£\  >  M)  <  t.  Thus  to  study  the  limiting 
distribution  of  £  —  k  —  k°,  it  suffices  to  study  the  behavior  of  Sj{k°  +  £)  —  Sr(&?)  for 
bounded  £.  We  shall  prove  that  Sr(k°  +  £)  —  Sr(k°)  converges  in  distribution  for  each 
£  to  (1  +  Ai)WW(£,Ai),  where  Aj  and  W<M(£,\i)  are  denned  in  the  text.  This  will 
imply  jfc  -  k°  -i+  argmin,(l  +  Ai)WW(£,  A^;  see,  Bai  (1994b).  Because  (1  +  Aa)  >  0, 
argmin^l  +  X\)W^(£,  Ai)  =  argmin^W^1^,  Ai),  giving  rise  to  the  proposition.  First 
consider  the  case  of  £  >  0  and  £  <  M,  where  M  >  0  is  an  arbitrary  finite  number.  Let 

1       *?+<  1  t 

«  =  70T7  E«         and      ^  =  T      ,0      0      E      *,  («) 
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k°  t 

£i  =  7oE^         aQd      ft=jTTI    E    *S-  (44) 

1  t=1  x  t=k°+l 

Thus  Ai  is  the  least  squares  estimator  of  \i\  using  the  first  k°  +  £  observations  and 
A2  is  the  least  squares  estimator  of  a  weighted  average  of  fi2  and  /x3  using  the  last 
T  —  k°  —  £  observations.  The  interpretation  of  A;  (t  =  1,2)  is  similar.  The  estimators 
A*  (t  =  1, 2)  depend  on  £.  This  dependence  will  be  suppressed  for  notational  simplicity. 
It  is  straightforward  to  establish  the  following  result: 

ft  -  Hi  =  Op{T-1'2)         and      Ai  "  /*i  =  CUT"1'2)  (45) 

A;-^-^4(^3-^2)  =  OpCr-1/2)         and      A2-^-f=4(M3.-^2)  =  OpCT"1/2) 

1  —  Tj  1  —  Tj 

(46) 

A*-A.  =  op(r-1)   (i  =  i,2)  (47) 

where  the  Op{-)  terms  are  uniform  in  £  such  that  |^|  <  M.  Now, 

k°  k°+t  T 

ST(k°  +  t)  =  Y:(Y<-K)2+    E   (*-«)*  +    E    &-&*.  (48) 

Similarly, 

k°  k°+t  x 

sT(k°1)  =  ±(Yt-filf+  £  (f<-a2)2  +  E  (y<-M2-         (49) 

t=i  t=k°+i  k°+e+i 

The  difference  between  the  two  first  terms  on  the  right  hand  side  of  (48)  and  (49), 
respectively,  is 

Jt°  k° 

t,(Xt  -  At)2  -  £pi  -  Ai)2  =  *?(£  -  Ai)2  =  0P(T-').  (50) 

t=l  t=\ 

Similarly,  the  difference  between  the  two  third  terms  on  the  right  hand  sides  of  (48) 
and  (49),  respectively,  is  also  Op(T-1).  Next  consider  the  difference  between  the  two 
middle  terms.  For  t  €  [k°  +  M],  Yt  =  /x2  +  Xt.  Hence 

k°+t  k°+t 

E  %  -  At)2  -  E  (^  -  a2)2 

t=Jfc°+l  t=k°+l 

k°+( 

=  2{/x2-At-(/x2-A2)}  E  xt  +  £{(,x2-At)2-(^-A2)2}.      (51) 
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From  (45)  and  (46),  we  have 

fi2  -  ft  -  (fl2  -  fa)  =  (V2  ~  /*l)(l  +  Aj)  +  Op{T-1'2) 

and 

(ft  -  AI)2  -  (ft  -  £2)2  =  0*2  -  /xx)2(l  -  A2)  +  Op(T-1/2). 
Thus  (51)  is  equal  to 

2(/z2-/i1)(l  +  A1)    £    ^t  +  ^2-Ai1)2(l-A;)  +  Op(r-1/2).  (52) 

t=it°+i 

Under  strict  stationarity,  J2tLe>+i  ^*  ^ias  ^e  same  distribution  as  Ylt=\  Xt.  Thus  (52) 
or,  equivalently,  (51)  converges  in  distribution  to  (1  +  Ai)!^  (£,  Ax).  This  implies 
that  ST(k^  +  £)-  SHfc?)  converges  in  distribution  to  (1  +  A^W^Vi  Aj)  for  £  >  0. 
It  remains  to  consider  £  <  0.  We  replace  £  by  —  £  and  still  consider  a  positive  £. 
In  particular,  ft  and  ft  are  defined  with  —  £  in  place  of  £.  Then  (48)  and  (49)  are 
replaced,  respectively,  by 

sHk*  -  £)  =  e  (Yt  -  ad2  +   e  (^-ft)2+E(^-ft)2       (53) 

<=1  t=fc°-M-l  A:°+l 

and 

k°  -(  k°  x 

ST(k°1)=J2(Yt-fa)2  +      E     (rt-Ai)2+E(^-A2)2.  (54) 

'=1  t=Jt°-M-l  fcj+l 

The  major  distinction  between  (48)  and  (53)  lies  in  the  change  of  ft  to  ft  f°r  the 
middle  terms  on  the  right  hand.  One  can  observe  a  similar  change  for  (49)  and  (54). 
Similar  to  (50),  the  difference  between  the  two  first  terms  on  the  right  hand  of  (53) 
and  (54)  is  Op{T~l).  The  same  is  true  for  the  difference  between  the  two  third  terms 
on  the  right  hand.  Using  Yt  =  fi\  +  Xt  for  t  <  k°,  we  have 

E  ow;)2-  E  (Yt-M2 
t=jt°-/+i  t=k°~i+i 

k° 

=  2(ft-ft)     E     Xt  +  i^-MH  +  O^T-V2) 
t=k°-e+i 

fc° 

=  -2(fi2-fi1)(l  +  X1)     E     ^  +  (A*2-Mi)a(l  +  Ai)2^  +  Op(r-1/2). 

t=k°-t+l 
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Ignoring  the  (^(T-1/2)  term  and  using  strict  stationarity,  we  see  that  the  above  has  the 
same  distribution  as  (1  +  X\)Wi  (—£,  Ai).  In  summary,  we  have  proved  that  Sr{k°)  + 
£)  —  Sr(k°)  converges  in  distribution  to  (1  +  \i)W^(£,  Ax).  This  convergence  implies 
that  k  -  k°  -±+  argmin,(l  +  \i)WW(i,  Aa)  =  argmin^1^,  Ax).  The  proposition  is 
proved. 

Proof  of  Proposition  5.  The  argument  is  virtually  the  same  as  in  the  proof  of 
Proposition  4.  The  reason  for  A  =  0  is  that  regression  coefficients  can  be  consistently 
estimated  in  this  case,  in  contrast  with  the  inconsistent  estimation  given  in  (46).  The 
details  will  not  be  presented  to  avoid  repetition.         □ 

Proof  of  Lemma  8.  From  the  identity  (2),  Sn  —  Sff(k)  =  NV^(k)2,  where  V^(k)  = 
{(k/N)(l  -  k/N)}l/2(Y,*  -  Yk).  It  is  enough  to  consider  it  such  that  k  €  [nrj,  n(l  -  77)] 
because  N  and  n  are  of  the  same  order.  Now 

N^2VN(k) 

,  -I  n+n2  1  k 

=  N^{(k/N)(i  -  k/N)yi\      l    -  y,  x*  -  rf-  £  x* 

\n  +  n2-k  k+1  k  +  nx  _nj+1 

n2                         n2  «i  ,       ni       \ 

A*2  -  — -7— TJ1  ~  1.  ■  _  Pi  +  1.  .  _  A» ) 


n  +  n,2  —  k  n  +  712  —  k         k  +  ni  k  +  ni 

n+n2 


Vn  +  ^-fc^j  K  +  ni_ni+1  n/ 

=    »"2{(*/<0(l  -  kW  (^  E  *'  ~  £  E  *)  +  0p(n-"2) 

where  the  second  equality  follows  from  n,  =  0P(1)  and  A-1  =  0(n_1);  the  third 
follows  from  the  asymptotic  equivalence  of  N  and  n;  the  fourth  follows  from  some 
simple  algebra.  For  k  =  [nr],  iV1/2V/v(fc)  converges  in  distribution  to  a(l)a{r(l  — 
r)}_1/'2[rB(l)  —  B(t)].  This  gives  the  finite  dimensional  convergence.  The  rest  follows 
from  the  functional  central  limit  theorem  and  the  continuous  mapping  theorem.  □ 
Proof  of  Proposition  9.  By  the  T  consistency  of  fc,_i  and  &,+i,  we  see  that  k®  is 
a  nontrivial  and  dominating  break  point  in  the  interval  [fc,_i,  fct+i].  Thus  the  T  con- 
sistency of  k'  for  kf  follows  from  the  property  of  sequential  estimator.  The  argument 
for  the  limiting  distribution  is  the  same  as  that  of  Proposition  5.  □ 
Proof  of  Lemma  9. 
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Proof  of  (a).  First  consider  k  <  k°.  From  (17), 


\UT(k/T)  -  EUT(k/T)  -  T-1  J2(X?  -  EX2)\  =  \RlT(k)  -  ER1T(k)\       (55) 


t=i 


The  first  two  terms  of  Ru(k)  [see  (18)]  are  linear  in  /z.-j,  and  thus  are  vtOv{T  ll2). 
The  last  two  terms  do  not  depend  on  /x,j,  but  are  of  higher  order  than  vtOv{T~1I2). 
Moreover,  ERlT(k)  =  (^(T-1)  uniformly  in  k.  Thus  \R1T{k)-ERlT(k)\  =  Op{T-^2vT) 
uniformly  in  k  <  k°.  This  proves  the  lemma  for  k  <  k°.  The  proof  for  k  >  k°  is  the 
same  and  follows  from  RiT(k)  -  ERiT(k)  =  Op{T-l'2vT)     (i  =  2, 3). 

Proof  of  (b).  Consider  first  k  <  k°.  By  the  second  equality  of  (28),  the  first  term 
of  ESr(k)  —  ESr(k°)  on  the  right  hand  side  depends  on  the  squared  and  the  cross 
product  of  fiix  —  H(i+i)T  (i  —  1>2)  (hence  on  v2*).  Factor  out  v\  and  replace  fij  by  fij, 
the  rest  of  proof  will  be  the  same  as  that  of  Lemma  3.  This  implies  that 

EST(k)  -  EST{k%)  >  t4(Jfc?  -  k)C2/2 

where  C  is  given  by  (29)  with  fij  in  place  of  fij.  The  proof  for  k  >  k°  is  similar  and 
the  details  are  omitted.  □ 

Proof  of  Lemma  10.  As  in  the  proof  of  Lemma  4,  it  suffices  to  show  that  for  every 
t\  >  0,  there  exists  an  M  >  0  such  that 


sup 


T{R1T(k)-R1T(k°1)} 


/C  rt-i 


>  tjvt     <  e. 


(56) 


The  above  is  similar  to  (35)  with  rj  replaced  by  r/Vj  and  Dj,m  replaced  by  D^M. 
Note  for  k  6  Dj-M,  we  either  have  k  <  k°  —  Mv^2  or  k  >  k°  +  Mvj2 .  Consider 
k  <  fc°  —  Mvj2.  We  need  to  show  that  each  term  on  the  right  hand  side  of  (36) 
divided  by  k°  —  k  is  no  larger  than  rjVj-  as  long  as  M  is  large  and  T  is  large.  The  proof 
requires  the  Hajek  and  Renyi  inequality,  extended  to  linear  processes  by  Bai  (1994a): 
there  exists  a  C\  <  oo  such  that  for  each  £  >  0, 


fellS^'H^- 


Now  consider  the  first  term  on  the  right  hand  side  of  (36).  Note  that  \axk\  <  vtL  for 
some  L  <  oo.  Thus  it  is  enough  to  show 


p(      sup         5Z  x*  >f]vTL  M  < 


Xk<k\-Mx,~*     t=k+l 
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for  large  M.  By  the  Hajek  and  Renyi  inequality  (applied  with  the  data  order  reversed 
by  treating  k°  as  1), 

/V  CXL2  C^L2 


p(      sup  J2  Xt   >  tjvtL-1)  < 


^k<k°-Mv-2    t=k+l 


Tj2VjMvT  T)2M 


The  above  probability  is  small  if  M  is  large.  The  proof  of  Lemma  4  demonstrates 
that  all  other  terms  are  of  lower  or  equal  magnitude  than  the  term  just  treated.  This 
proves  the  lemma  for  k  less  than  k\.  The  proof  for  k  >  fc°  is  analogous.  Q. 

Proof  of  Proposition  11.  The  proof  is  similar  to  that  of  Proposition  4.  We  only 
outline  the  major  distinction.  In  view  of  the  rate  of  convergence,  we  consider  the 
process  Ar(s)  =  Sr{k°  +  [suj2])  —  Sr(fc°),  indexed  by  a  real  number  5.  We  shall 
derive  the  limiting  process  for  |s|  <  M  for  an  arbitrary  given  M  <  oo.  Let  D[—M,  M] 
denote  the  space  of  cadlag  functions  endowed  with  the  Skorohod  metric,  see  Pollard 
(1984).  We  shall  show  that  At(s)  converges  weakly  in  D[—M,M]  to  a  pertinent 
limiting  process.  First  consider  s  >  0.  Let  £  =  [su^2].  Define  /2*  and  /2t  as  in  (43)  and 
(44),  respectively.  Then  (45)-(47)  still  hold  with  fij  interpreted  as  fijx-  For  example, 

vt(^  -  *t)  =  — ^fTl —  +  ^r^  E  xt  =  o,(i). 

This  follows  because,  from  |£|  <  Mv^2,  the  first  term  on  the  right  hand  side  is  of 
0(1/(VTvt))  which  converges  to  zero,  and  the  second  term  is  0P(1).  Equations  (48)- 
(49)  are  simply  identities  and  still  hold  here.  Similar  to  the  proof  of  Proposition  4, 
the  difference  between  the  two  first  terms  and  the  difference  between  the  two  third 
terms  of  (48)  and  (49)  converge  to  zero  in  probability.  Equation  (52)  in  the  present 
case  is  reduced  to 

k°+l 

2(l  +  A1)(/i2-/iiK    E    Xt  +  ^(/22-/ii)2(l-Ai)2-rOp(r-1/2). 

t=k°+l 

Note  that  Ai  is  free  from  vt  because  it  is  canceled  out  due  to  its  presence  in  the 
denominator  and  the  numerator.  From  £  =  [sv?  ],  using  the  functional  central  limit 
theorem  for  linear  processes  [e.g.,  Phillips  and  Solo  (1992)] 

k°+[av-*}  [sv-2] 

VT      2      Xt  =  vT  J2  Xt-k°  =►  a(l)<T£jB2(.s) 
t=k°1+l  *=1 

in  the  space  D[0,  M],  where  ^(s)  is  a  Brownian  motion  process  on  [0,  oo),  and 
£vf2  =  [svj2]vj  —*  s,         uniformly  in  s  €  [0,  M}. 
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In  summary,  for  s  >  0, 

ST(k°  +  [svy2])  -  ST(k°)  =►  2(1  +  Ax)(/22  -  /21)a(l)<7£52(s)  +  s(£2  -  /i,)2(l  -  A2). 
The  same  analysis  shows  that  for  s  <  0, 

5T(A:?  +  [^?2])  -  5T(fc?)  =>  2(1  4-  A1)(/i2  -  /i1)a(l)at51(-5)  +  |s|(/22  -  fa)2(l  +  Ax)2, 

where  Bi(-)  is  another  Brownian  motion  process  on  [0,  oo)  independent  of  B2(-).  In- 
troduce 

f  2o(l)<7e51(-s)  +  |s|(l  +  A)     if  s  <  0 
1  lS'  A;  _  \  2a{\)atB2(s)  +  |s|(l  -  A)        if  s  >  0 

with  T(0,  A)  =  0.  The  process  T  differs  from  A  in  the  extra  term  a(l)at.  By  a  change 
of  variable,  it  can  be  show  that  argminsr(s,  A)  =  a(l)2<T2argminsA(s,  A).  Now  because 
cBi(s)  has  the  same  distribution  as  Bi(c*s),  we  have 

ST(k°  +  [svt2])  -  ST{k\)  =►  (1  +  Ax)r((/i2  -  fcfs,  Ax) . 

This  implies  that 

Tvt(t  -  t°)    -^-»    argmins(l  +  Aa)r((/t2  -  £i)2s,  Ax) 
=      (^2  -  /xi^axgmmJXi;,  Ax) 
=      (a«2  -  /ii)"2a(l)2o-2argmint;A(t;,  Ax). 

We  have  used  the  fact  that  argminIa/(x)  =  argminx/(x)  for  a  >  0  and  argminx/(a2x)  = 
a-2argminx/(x)  for  an  arbitrary  function  f(x).        □ 
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Figure  1:   Histograms  of  the  estimated  break  points  for  Model  (I):  (a)  Sequential 
method;  (b)  Repartition  method;  (c)  Simultaneous  Method. 
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Figure  2:   Histograms  of  the  estimated  break  points  for  Model  (II):  (a)  Sequential 
method;  (b)  Repartition  method;  (c)  Simultaneous  Method. 
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Figure  3:  Histograms  of  the  estimated  numbers  (of  breaks)  for  models  (I),  (II)  and 
(III).  Sequential  estimation:  (a),(6),(c).  BIC  criterion:  (a'),  (6'),  (d) 
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